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Chapter 1 

Spacetime 

1 .1 Three models of spacetime 

“The test of a first-rate intelligence is the ability to hold two op- 
posing ideas in mind at the same time and still retain the ability to 
function.” — F. Scott Fitzgerald 





a / Three views of spacetime. 1 . A 
typical graph of a particle’s mo- 
tion: an oscillation. 2. In relativity, 
it’s customary to swap the axes, 
and 3 we can even remove the 
axes entirely. 


Time and space together make spacetime, figure a, the stage on 
which physics is played out. Until 1905, physicists were trained to 
accept two mutually contradictory theories of spacetime. I’ll call 
these the Aristotelian and Galilean views, although my colleagues 
from that era would have been offended to be accused of even partial 
Aristotelianism. 
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b / 1 . An observer and two clocks. 
2. Idealization as events. 3. Vec- 
tors used to represent relation- 
ships between events. 




c / Valid vectors representing 
observers and simultaneity, ac- 
cording to the Aristotelian model 
of spacetime. 


1.1.1 Aristotelian spacetime 

Figure b/1 shows an observer and two clocks, represented using 
the graphical conventions of figure a/3. The existence of such a 
material object at a certain place and time constitutes an event, 
which we idealize as a point, b/2. Spacetime consists of the set of 
all events. As time passes, a physical object traces out a continuous 
curve, a set of events known in relativistic parlance as its world- 
line. Since paper and computer screens are two-dimensional, the 
drawings only represent one dimension of space plus one dimension 
of time, which in relativity we call “1+1 dimensions.” The real 
universe has three spatial dimensions, so real spacetime has 3+1 
dimensions. Most, but not all, of the interesting phenomena in 
special relativity can be understood in 1+1 dimensions, so whenever 
possible in this book I will draw 1+1-dimensional figures without 
apology or explanation. 

The relativist’s attitude is that events and relationships between 
events are primary, while coordinates such as x and t are secondary 
and possibly irrelevant. Coordinates let us attach labels like ( x , t ) to 
points, but this is like God asking Adam to name all the birds and 
animals: the animals didn’t care about the names. Figure b/3 shows 
the use of vectors to indicate relationships between points. Vector 
o is an observer-vector, connecting two points on the world-line of 
the person. It points from the past into the future. The vector s 
connecting the two clocks is a vector of simultaneity. The clocks 
have previously been synchronized side by side, and if we assume 
that transporting them to separate locations doesn’t disrupt them, 
then the fact that both clocks read two minutes after three o’clock 
tells us that the two events occur at the same time. 

The Aristotelian model of spacetime is characterized by a set 
of rules about what vectors are valid observer- and simultaneity- 
vectors. We require that every o vector be parallel to every other, 
and likewise for s vectors. But, as is usual with vectors, we allow 
the arrow to be drawn anywhere without considering the different 
locations to have any significance; that is, our model of spacetime 
doesn’t allow different regions to have different properties. 

When Einstein was a university student, these rules (phrased dif- 
ferently) were the ones he was taught to use in describing electricity 
and magnetism. He later recalled imagining himself on a motorcy- 
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cle, riding along next to a light wave and trying to imagine how his 
observations could be reconciled with Maxwell’s equations. I don’t 
know whether he was ever brave enough to describe this daydream 
to his professors, but if he had, their answer would have been essen- 
tially that his hypothetical o vector was illegal. The good o vectors 
were thought to be the ones that represented an observer at rest 
relative to the ether, a hypothetical all-pervasive medium whose vi- 
brations were electromagnetic waves. However silly this might seem 
to us a hundred years later, it was in fact strongly supported by the 
evidence. A vast number of experiments had verified the validity of 
Maxwell’s equations, and it was known that if Maxwell’s equations 
were valid in coordinates (x, t) defined by an observer o, they would 
become invalid under the transformation ( x',t ') = (x + vt,t) to co- 
ordinates defined by an observer o' in motion at velocity v relative 
to o. 


1.1.2 Galilean spacetime 

But the Aristotelian model was already known to be wrong when 
applied to material objects. The classic empirical demonstration of 
this fact came around 1610 with Galileo’s discovery of four moons 
orbiting Jupiter, figure d. Aristotelianism in its ancient form was 
originally devised as an explanation of why objects always seemed to 
settle down to a natural state of rest according to an observer stand- 
ing on the earth’s surface. But as Jupiter hew across the heavens, 
its moons circled around it, without showing any natural tendency 
to fall behind it like a paper cup thrown out the window of a car. 
Just as an observer Oi standing on the earth would consider the 
earth to be at rest, 02 hovering in a balloon at Jupiter’s cloudtops 
would say that the jovian clouds represented an equally “natural” 


state of rest. 
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• 




d/A simulation of how Jupiter 
and its moons might appear at 
intervals of three hours through 
a telescope. Because we see 
the moons’ circular orbits edge- 
on, their world-lines appear sinu- 
soidal. Over this time period, the 
innermost moon, lo, completes 
half a cycle. 
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e / Valid vectors representing 
observers and simultaneity, ac- 
cording to the Galilean model of 
spacetime. 



f / Example 1. 


We are thus led to a different, Galilean, set of rules for o and 
s vectors. All s vectors are parallel to one another, but any vector 
that is not parallel to an s vector is a valid o vector. (We may 
wish to require that it point into the future rather than the past, 
but Newton’s laws are symmetric under time-reversal, so this is not 
strictly necessary.) 

Galilean spacetime, unlike Aristotelian spacetime, has no univer- 
sal notion of “same place.” I can drive to Gettysburg, Pennsylvania, 
and stand in front of the brass plaque that marks the site of the mo- 
mentous Civil War battle. But am I really in the same place? An 
observer on another planet would say that our planet had moved 
through space since 1863. 

Note that our geometrical description includes a notion of paral- 
lelism, but not of angular measure. We don’t know or care whether 
the “angle” between an s and an o is 90 degrees. One represents 
a distance, while the other represents an interval of time, and we 
can’t define the angle between a distance and a time. The same was 
true in the Aristotelian model; the vectors in figure c were drawn 
perpendicular to one another simply as a matter of convention, but 
any other angle could have been used. 

The Galilean twin paradox Example 1 

Alice and Betty are identical twins. Betty goes on a space voy- 
age, traveling away from the earth along vector Oi and then turn- 
ing around and coming back on o 2 . Meanwhile, Alice stays on 
earth. Because this is an experiment involving material objects, 
and the conditions are similar to those under which Galilean rel- 
ativity has been repeatedly verified by experiment, we expect the 
results to be consistent with Galilean relativity’s claim that mo- 
tion is relative. Therefore it seems that it should be equally valid 
to consider Betty and the spaceship as having been at rest the 
whole time, while Alice and the planet earth traveled away from 
the spaceship along o 3 and then returned via o 4 . But this is not 
consistent with the experimental results, which show that Betty 
undergoes a violent acceleration at her turnaround point, while 
Alice and the other inhabitants of the earth feel no such effect. 

The paradox is resolved by realizing that Galilean relativity de- 
fines unambiguously whether or not two vectors are parallel. It’s 
true that we could fix a frame of reference in which Ot represented 
the spaceship staying at rest, but o 2 is not parallel to Oi , so in this 
frame we still have a good explanation for why Betty feels an ac- 
celeration: she has gone from being at rest to being in motion. 

Regardless of which frame of reference we pick, and regardless 
of whether we even fix a frame of reference, o 3 and o 4 are parallel 
to one another, and this explains why Alice feels no effect. 


14 


Chapter 1 Spacetime 




1.1.3 Einstein’s spacetime 

We have two models of spacetime, neither of which is capable 
of describing all the phenomena we observe. Because of the rela- 
tively crude state of technology ca. 1900, it required considerable 
insight for Einstein to piece together a fragmentary body of indirect 
evidence and arrive at a consistent and correct model of spacetime. 
Today, the evidence is part of everyday life. For example, every 
time you use a GPS receiver, you’re using Einstein’s theory of rela- 
tivity. Somewhere between 1905 and today, technology became good 
enough to allow conceptually simple experiments that students in 
the early 20th century could only discuss in terms like “Imagine that 
we could. . . ” 

A good jumping-on point is 1971. In that year, J.C. Hafele and 
R.E. Keating brought atomic clocks aboard commercial airliners, 
figure g, and went around the world, once from east to west and 
once from west to east. Hafele and Keating observed that there was 
a discrepancy between the times measured by the traveling clocks 
and the times measured by similar clocks that stayed home at the 
U.S. Naval Observatory in Washington. 1 The east-going clock lost 
time, ending up off by —59 ± 10 nanoseconds, while the west-going 
one gained 273 ± 7 ns. 

We are used to thinking of time as absolute and universal, so 
it is disturbing to find that it can flow at a different rates for dif- 
ferent observers. Nevertheless, the effects that Hafele and Keating 
observed were small. This makes sense: Galilean relativity had al- 
ready been thoroughly verified for material objects such as clocks, 
planets, and airplanes, so a new theory like Einstein’s had to agree 
with Galileo’s to a good approximation, within the Galilean theory’s 
realm of applicability. This requirement of backward-compatibility 
is known as the correspondence principle. 

It’s also reassuring that the effects on time were small compared 
to the three-day lengths of the plane trips. There was therefore no 
opportunity for paradoxical scenarios such as one in which the east- 
going experimenter arrived back in Washington before he left and 
then convinced himself not to take the trip. A theory that maintains 
this kind of orderly relationship between cause and effect is said to 
satisfy causality. 2 

Hafele and Keating were testing specific quantitative predictions 
of relativity, and they verified them to within their experiment’s er- 
ror bars. Let’s work backward instead, and inspect the empirical 
results for clues as to how time works. The disagreements among 
the clocks suggest that simultaneity is not absolute: different ob- 

1 There were actually several effects at work, but these details do not affect 
the present argument, which only depends on the fact that there is no absolute 
time. See p. 122 for more on this topic. 

2 For more about causality, see section 2.1, p. 43. 



g / The clock took up two seats, 
and two tickets were bought for it 
under the name of “Mr. Clock.” 





h / All three clocks are mov- 
ing to the east. Even though the 
west-going plane is moving to the 
west relative to the air, the air 
is moving to the east due to the 
earth’s rotation. 
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i / According to Einstein, simul- 
taneity is relative, not absolute. 


servers have different notions of simultaneity, as suggested in figure 
i. Just as Galilean relativity freed the o vectors from the constraint 
of being parallel to one another, Einstein frees the s vectors. Galileo 
made “same place” into an ambiguous concept, while Einstein did 
the same with “simultaneous.” But because a particular observer 
does have methods of synchronizing clocks (e.g., Einstein synchro- 
nization, example 4, p. 18), the definition of simultaneity isn’t com- 
pletely arbitrary. For each o vector we have a corresponding s vec- 
tor, which represents that observer’s opinion as to what constitutes 
simultaneity. Because the convention on a Cartesian x—t graph is to 
draw the axes at right angles to one another, we refer to such a pair 
of vectors as orthogonal, but the word is not to be interpreted lit- 
erally, since we can’t define an actual angle between a time interval 
and a spatial displacement. 


°i o 2 

L L 

1 ►Sj L — ►S. 

°1 o 2 

u 

S-, 

°i o 2 

U 

Galileo 

"rotational" 

Einstein 


j / Possibilities for the behavior of orthogonality. 


What, then, are the rules for orthogonality? Figure j shows three 
possibilities. In each case, we have an initial pair of vectors oi and 
si that we assume are orthogonal, and we then draw a new pair 02 
and S 2 for a second observer who is in motion relative to the first. 
The Galilean case, where S 2 remains parallel to si, has already been 
ruled out. The second case is the one in which s rotates in the same 
direction as o. This one is forbidden by causality, because if we kept 
on rotating, we could eventually end up rotating o by 180 degrees, 
so by a continued process of acceleration, we could send an observer 
into a state in which her sense of time was reversed. We are left 
with only one possibility for Einstein’s spacetime, which is the one 
in which a clockwise rotation of o causes a counterclockwise rotation 
of s, like closing a pair of scissors. 

Now there is a limit to how far this process can go, or else the s 
and o would eventually lie on the same line. But this is impossible, 
for a valid s vector can never be a valid o, nor an o a valid s. Such a 
possibility would mean that an observer would describe two different 
points on his own world-line as simultaneous, but an observer for 
whom no time passes is not an observer at all, since observation 
implies collecting data and then being able to remember it at some 
later time. We conclude that there is a diagonal line that forms 
the boundary between the set of possible s vectors and the set of 
valid o vectors. This line has some slope, and the inverse of this 
slope corresponds to some velocity, which is apparently a universal 
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and fixed property of Einstein’s spacetime. This velocity we call c, 
and the correspondence principle tells us that c must be very large, 
because otherwise Einsteinian, or “relativistic,” effects such as time 
distortion would have been large even for motion at everyday speeds; 
in the Hafele-Keating experiment they were quite small, even at the 
high speed of a passenger jet. 

Although c is a large number when expressed in meters per sec- 
ond, for convenience in relativity we will always choose units such 
that c = 1. The boundary between s and o vectors then appears on 
spacetime diagrams as a diagonal line at ±45 degrees. In more than 
one spatial dimension, this boundary forms a cone, figure k, and for 
reasons that will become more clear in a moment, this cone is called 
the light cone. Vectors lying inside the light cone are referred to as 
timelike, those outside as spacelike, and those on the cone itself as 
lightlike or null. 

An important advantage of Einstein’s relativity over Galileo’s 
is that it is compatible with the empirical observation that some 
phenomena travel at a certain fixed speed. Light travels at a fixed 
speed, and so do other phenomena such as gravitational waves (first 
directly detected in 2016). So do all massless particles (subsection 
4.3.4). This fixed speed is c, and all observers agree on it. In 1905, 
the only phenomenon known to travel at c was light, so c is usually 
described as the “speed of light,” but from the modern point of view 
it functions more as a kind of conversion factor between our units 
of measurement for time and space. It is a property of spacetime, 
not a property of light. 

More fundamentally, c is the maximum speed of cause and ef- 
fect. If we could propagate cause and effect, e.g., by transmitting 
a signal, at a speed greater than c, then the following argument 
shows that we would be violating either causality or the principle 
that motion is relative. If a signal could be propagated at a speed 
greater than c, then the vector r connecting the cause and the effect 
would be spacelike. By opening and closing the “scissors” of figure 
j , we can always find an observer o who considers r to be a vector of 
simultaneity. Thus if faster-than-light propagation is possible, then 
instantaneous propagation is possible, at least for some observer. 
Since motion is relative, this must be possible for all observers, re- 
gardless of their state of motion. Therefore any spacelike vector is 
one along which we can send a signal. But by adding two spacelike 
vectors we can make a vector lying in the past timelike light cone, 
so by relaying the signal we could send a message into the past, 
violating causality. 

In interpreting this argument, note that neither the relativity 
of motion nor causality is a logical necessity; they are both just 
generalizations based on a body of evidence. For more on causality, 
and its uncertain empirical status, see section 2.1, p. 43. 


future 



k / The light cone. 
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I/A ring laser gyroscope. 


The ring laser gyroscope Example 2 

If you’ve flown in a jet plane, you can thank relativity for helping 
you to avoid crashing into a mountain or an ocean. Figure I shows 
a standard piece of navigational equipment called a ring laser 
gyroscope. A beam of light is split into two parts, sent around the 
perimeter of the device, and reunited. Since the speed of light is 
constant, we expect the two parts to come back together at the 
same time. If they don’t, it’s evidence that the device has been 
rotating. The plane’s computer senses this and notes how much 
rotation has accumulated. 

No frequency-dependence Example 3 

Relativity has only one universal speed, so it requires that all light 
waves travel at the same speed, regardless of their frequency 
and wavelength. Presently the best experimental tests of the in- 
variance of the speed of light with respect to wavelength come 
from astronomical observations of gamma-ray bursts, which are 
sudden outpourings of high-frequency light, believed to originate 
from a supernova explosion in another galaxy. One such obser- 
vation, in 2009, 3 found that the times of arrival of all the different 
frequencies in the burst differed by no more than 2 seconds out 
of a total time in flight on the order of ten billion years! 



Einstein’s train Example 4 

> The figure shows a famous thought experiment devised by Ein- 
stein. A train is moving at constant velocity to the right when bolts 
of lightning strike the ground near its front and back. Alice, stand- 
ing on the dirt at the midpoint of the flashes, observes that the 
light from the two flashes arrives simultaneously, so she says the 
two strikes must have occurred simultaneously. Bob, meanwhile, 
is sitting aboard the train, at its middle. Fie passes by Alice at the 
moment when Alice later figures out that the flashes happened. 
Later, he receives flash 2, and then flash 1 . Fie infers that since 
both flashes traveled half the length of the train, flash 2 must have 
occurred first. Flow can this be reconciled with Alice’s belief that 
the flashes were simultaneous? 

> Figure n shows the corresponding spacetime diagram. It seems 
paradoxical that Alice and Bob disagree on simultaneity, but this is 

s http : //arxiv . org/ abs/0908 . 1832 
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only because we have an ingrained prejudice in favor of Galilean 
relativity. Alice’s method of determining that 1 and 2 were simul- 
taneous is valid, and is known as Einstein synchronization. The 
dashed line connecting 1 and 2 is orthogonal to Alice’s world-line. 
But Bob has a different opinion about what constitutes simultane- 
ity. The slanted dashed line is orthogonal to his world-line. Ac- 
cording to Bob, 2 happened before the time represented by this 
line, 1 after. 

Example 4 is of course impractical as described, since real trains 
don’t travel at speeds anywhere near c relative to the dirt. We say 
that their speeds are “nonrelativistic.” Because Einstein coined the 
term “relativity,” and his version of relativity superseded Galileo’s, 
the unmodified word is normally understood to refer to Einsteinian 
relativity. A physicist who studies Einstein-relativity is a relativist. 
A material object moving at a speed very close to c is described as 
ultrarelativistic. One often hears laypeople describing relativity in 
terms of certain effects that would happen “if you went at the speed 
of light.” In fact, as we’ll see in ch. 3 and 4, it is not possible to 
accelerate material objects to c, and in any case that isn’t necessary. 
Relativistic effects exist at all speeds, but they’re weak at speeds 
small compared to c. 

Numerical value of c 

In this book we’ll use units in which c = 1. However, many 
beginners are vexed by the question of why c has the particular 
value it does in a given system of units such as the SI. Related to this 
is the question of whether c could ever change, so that measuring 
it today and measuring it tomorrow would give slightly different 
results. In a system of units where c has units, its value is what it is 
only because of our choice of units, and there is no meaningful way 
to test whether it changes. 

Let’s take the SI as an example of a system of units. The SI was 
originally set up so that the meter and the second were defined in 
terms of properties of our planet. The meter was one forty-millionth 
of the earth’s circumference, and the second was 1/86,400 of a mean 
solar day. Thus when we express c as 3 x 10 s m/s, we are basically 
specifying the factor by which c exceeds the speed at which a point 
on the equator goes around the center of the earth (with additional 
conversion factors of 40,000,000 and 86,400 thrown in). Since the 
properties of our planet are accidents of the formation of the solar 
system, there is no physical theory that can tell us why c has this 
value in the original French-Revolutionary version of the SI. 

The base units of the SI were redefined over the centuries. Today, 
the second is defined in terms of an atomic standard, and the meter 
is defined as 1/299,792,458 of a light-second. Therefore c has a 
defined value of exactly 299,792,458 m/s. Again, we find that the 
numerical value of c has no fundamental significance; it is merely a 
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matter of definition. 


It is possible to form the unitless ratio a = e 2 /hc ~ 1/137, 
called the fine structure constant. Its value does not depend on 
our choice of units, so it is possible to do experiments to look for 
changes in its value over time, e.g., by comparing the spectrum of 
hydrogen on earth with the spectrum of distant stars, whose light 
has taken billions of years to get to us. Claims have even been made 
to the effect that these observations do show a change, although this 
appears to have been a mistake. If such a change did occur, we would 
not be able to attribute it unambiguously to a change in c rather 
than a change in h or the fundamental charge. 

The standards used to define our units could change over time. 
The platinum-iridium standard for the kilogram in Paris is suspected 
to have lost about 50 fig of mass over the last century. Even the 
atomic standard used to define the second could be changing due 
to physics beyond our present knowledge. A change in c might 
produce such a change, but any such change could also be produced 
by changes in other physical constants, such as the others occurring 
in the fine structure constant. Such issues are discussed at greater 
length in section 9.6, p. 207. 

Discussion question 

A The machine-gunner in the figure sends out a spray of bullets. Sup- 
pose that the bullets are being shot into outer space, and that the dis- 
tances traveled are trillions of miles (so that the human figure in the dia- 
gram is not to scale). After a long time, the bullets reach the points shown 
with dots which are all equally far from the gun. Their arrivals at those 
points are events A through E, which happen at different times. The chain 
of impacts extends across space at a speed greater than c. Does this 
violate special relativity? 


1.2 Minkowski coordinates 

It is often convenient to name points in spacetime using coordinates, 
and a particular type of naming, chosen by Einstein and Minkowski, 
is the default in special relativity. I’ll refer to the coordinates of this 
system as Minkowski coordinates, and they’re what I have in mind 
throughout this book when I use letters like t and x (or variations 
like x ' , t Q , etc.) without further explanation. To define Minkowski 
coordinates in 1 + 1 dimensions, we need to pick (1) an event that 
we consider to be the origin, (t, x) = (0,0), (2) an observer- vector 
o, and (3) a side of the observer’s world-line that we will call the 
positive x side, and draw on the right in diagrams. The observer is 
required to be inertial, 4 so that by repeatedly making copies of o 


4 For now we appeal to the freshman mechanics notion of “inertial.” A better 
relativistic definition, which differs from the Newtonian one, is given in ch. 5. 


20 


Chapter 1 


Spacetime 



Discussion question A. 



and laying them tip-to-tail, we get a chain that lies on top of the 
observer’s world-line and represents ticks on the observer’s clock. 

Minkowski coordinates use units with c = 1. Explicitly, we define 
the unique vector s that is orthogonal to o, points in the positive 
direction, and has a length of one clock-tick. In practical terms, the 
orthogonality could be defined by Einstein synchronization (example 
4, p. 18), and the length by arranging that a radar echo travels to 
the tip of s and back in two ticks. 

We now construct a graph-paper lattice, figure q, by repeating 
the vectors o and s. This grid defines a name (t, x) for each point 
in spacetime. 

1.3 Measurement 

We would like to have a general system of measurement for relativity, 
but so far we have only an incomplete patchwork. The length of a 
timelike vector can be defined as the time measured on a clock that 
moves along the vector. A spacelike vector has a length that is 
measured on a ruler whose motion is such that in the ruler’s frame 
of reference, the vector’s endpoints are simultaneous. But there is no 
third measuring instrument designed for the purpose of measuring 
lightlike vectors. 

Nor do we automagically get a complete system of measurement 
just by having defined Minkowski coordinates. For example, we 
don’t yet know how to find the length of a timelike vector such 
as (At, Ax) = (2,1), and we suspect that it will be not equal 2, 
since the Hafele-Keating experiment tells us that a clock undergoing 
the motion represented by Ax = 1 will probably not agree with a 
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clock carried by the observer whose clock we used in defining these 
coordinates. 

1.3.1 Invariants 

The whole topic of measurement is apt to be confusing, because 
the shifting landscape of relativity makes us feel as if we’ve walked 
into a Salvador Dali landscape of melting pocket watches. A good 
way to regain our bearings is to look for quantities that are in- 
variant: they are the same in all frames of reference. A Euclidean 
invariant, such as a length or an angle, is one that doesn’t change 
under rotations: all observers agree on its value, regardless of the 
orientations of their frames of reference. For a relativistic invariant, 
we require in addition that observers agree no matter what state of 
motion they have. (A transformation that changes from one iner- 
tial frame of reference to another, without any rotation, is called a 
boost.) 

Electric charge is a good example of an invariant. Electrons 
in atoms typically have velocities of 0.01 to 0.1 (in our relativis- 
tic units, where c = 1), so if an electron’s charge depended on its 
motion relative to an observer, atoms would not be electrically neu- 
B I tral. Experiments have been done 5 to test this to the phenomenal 

precision of one part in 10 21 , with null results. 

A vector can never be an invariant, since it changes direction 
under a rotation. (Some vectors, such as velocities, also change un- 
der a boost.) In freshman mechanics, any quantity, such as energy, 
that wasn’t a vector usually fell into the category we referred to as 
scalars. In relativity, however, the term “scalar” has a much more 
restrictive definition, which we’ll discuss in section 6.2.1, p. 127. 

By the way, beginners in relativity sometimes get confused about 
invariance as opposed to conservation. They are not the same thing, 
and neither implies the other. For example, momentum has a direc- 
tion in space, so it clearly isn’t invariant — but we’ll see in section 
4.3 that there is a relativistic version of the momentum vector that 
is conserved. As in Newtonian mechanics, we don’t care if all ob- 
servers agree on the momentum of a system — we only care that 
the law of momentum conservation is valid and has the same form 
in all frames. Conversely, there are quantities that are invariant but 
not conserved, mass being an example. 

1.3.2 The metric 

Area in 1 + 1 dimensions is also an invariant, as proved on p. 49. 
The invariance of area has little importance on its own, but it pro- 
vides a good stepping stone toward a relativistic system of measure- 
ment. Suppose that we have events A (Charles VII is restored to 


s Marinelli and Morpugo, “The electric neutrality of matter: A summary,” 
Physics Letters B137 (1984) 439. 
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the throne) and B (Joan of Arc is executed). Now imagine that 
technologically advanced aliens want to be present at both A and 
B, but in the interim they wish to fly away in their spaceship, be 
present at some other event P (perhaps a news conference at which 
they give an update on the events taking place on earth), but get 
back in time for B. Since nothing can go faster than c (which we 
take to equal 1), P cannot be too far away. The set of all possible 
events P forms a rectangle, figure r/1, in the 1 + 1-dimensional plane 
that has A and B at opposite corners and whose edges have slopes 
equal to ±1. We call this type of rectangle a light-rectangle. 

The area of this rectangle will be the same regardless of one’s 
frame of reference. In particular, we could choose a special frame 
of reference, panel 2 of the figure, such that A and B occur in the 
same place. (They do not occur at the same place, for example, in 
the sun’s frame, because the earth is spinning and going around the 
sun.) Since the speed c = 1 is the same in all frames of reference, 
and the sides of the rectangle had slopes ±1 in frame 1, they must 
still have slopes ±1 in frame 2. The rectangle becomes a square, 
whose diagonals are an o and an s for frame 2. The length of these 
diagonals equals the time r elapsed on a clock that is at rest in frame 
2, i.e., a clock that glides through space at constant velocity from A 
to B, reuniting with the planet earth when its orbit brings it to B. 
The area of the gray regions can be interpreted as half the square 
of this gliding-clock time, which is called the proper time. “Proper” 
is used here in the somewhat archaic sense of “own” or “self,” as in 
“The Vatican does not lie within Italy proper.” Proper time, which 
we notate r, can only be defined for timelike world-lines, since a 
lightlike or spacelike world-line isn’t possible for a material clock. 

In terms of (Minkowski) coordinates, suppose that events A and 
B are separated by a distance x and a time t. Then in general 
t 2 — x 2 gives the square of the gliding-clock time. Proof: Because 
of the way that area scales with a rescaling of the coordinates, the 
expression must have the form (. . ,)t 2 + (. . ,)tx + (. . . )x 2 , where each 
(. . .) represents a unit less constant. The tx coefficient must be zero 
by the isotropy of space. The t 2 coefficient must equal 1 in order 
to give the right answer in the case of x = 0, where the coordinates 
are those of an observer at rest relative to the clock. Since the area 
vanishes for x = t, the x 2 coefficient must equal —1. 

When |x| is greater than |i|, events A and B are so far apart in 
space and so close together in time that it would be impossible to 
have a cause and effect relationship between them, since c = 1 is 
the maximum speed of cause and effect. In this situation t 2 — x 2 is 
negative and cannot be interpreted as a clock time, but it can be 
interpreted as minus the square of the distance between A and B, as 
measured in a frame of reference in which A and B are simultaneous. 

Generalizing to 3+1 dimensions and to any vector v, not just 
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a displacement in spacetime, we have a measurement of the vector 
defined by 

2 2 2 2 
V t -V x -Vy~ V z . 

In the special case where v is a spacetime displacement, this can be 
referred to as the spacetime interval. Except for the signs, this looks 
very much like the Pythagorean theorem, which is a special case of 
the vector dot product. We therefore define a function g called the 
metric , 

ff(u, v) = U t V t - U X V X - UyVy - U Z V Z . 

Because of the analogy with the Euclidean dot product, we often use 
the notation uv for this quantity, and we sometimes call it the inner 
product. The metric is the central object of relativity. In general 
relativity, which describes gravity as a curvature of spacetime, the 
coefficients occurring on the right-hand side are no longer ±1, but 
must vary from point to point. Even in special relativity, where the 
coefficients can be made constant, the definition of g is arbitrary up 
to a nonzero multiplicative constant, and in particular many authors 
define g as the negative of our definition. The sign convention we 
use is the most common one in particle physics, while the opposite 

is more common in classical relativity. The set of signs, H or 

— b ++, is called the signature of the metric. 

In subsection 1.1.3 we developed the idea of orthogonality of 
spacetime vectors, with the physical interpretation that if an ob- 
server moves along a vector o, a vector s that is orthogonal to o is 
a vector of simultaneity. This corresponds to the vanishing of the 
inner product, o • s = 0, and is only imperfectly analogous to the 
idea that Euclidean vectors are perpendicular if their dot product is 
zero. In particular, a nonzero Euclidean vector is never perpendic- 
ular to itself, but for any lightlike vector v we have v • v = 0. The 
metric doesn’t give us a measure of the length of lightlike vectors. 
Physically, neither a ruler nor a clock can measure such a vector. 

The metric in SI units Example 5 

Units with c = 1 are known as natural units. (They are natural 
to relativity in the same sense that units with h = 1 are natural 
to quantum mechanics.) Any equation expressed in natural units 
can be reexpressed in SI units by the simple expedient of insert- 
ing factors of c wherever they are needed in order to get units that 
make sense. The result for the metric could be 

^(U, V) = C 2 U t V, - U X V X - UyVy - u z v z 

or 

^(U, V) = U t V t - (U X V X - UyVy - u z v z )/c 2 . 

It doesn’t matter which we pick, since the metric is arbitrary up to 
a constant factor. The former expression gives a result in meters, 
the latter seconds. 
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Orthogonal light rays ? Example 6 

> On a spacetime diagram in 1+1 dimensions, we represent the 
light cone with the two lines x = ±t, drawn at an angle of 90 
degrees relative to one another. Are these lines orthogonal? 

> No. For example, if u = (1, 1) and v = (1, -1), then u v is 2, not 
zero. 

Pioneer 10 Example 7 

o The Pioneer 1 0 space probe was launched in 1 972, and in 1 973 
was the first craft to fly by the planet Jupiter. It crossed the orbit 
of the planet Neptune in 1983, after which telemetry data were 
received until 2002. The following table gives the spacecraft’s 
position relative to the sun at exactly midnight on January 1 , 1983 
and January 1 , 1 995. The 1 983 date is taken to be t = 0. 

t (s) x y z 

“6 1.784 x 10 12 m 3.951 x 10 12 m 0.237 x 10 12 m 

3.78691 20000 x 1 0 8 s 2.420 x 1 0 12 m 8.827 x 1 0 12 m 0.488 x 1 0 12 m 

Compare the time elapsed on the spacecraft to the time in a frame 
of reference tied to the sun. 

> We can convert these data into natural units, with the distance 
unit being the second (i.e., a light-second, the distance light trav- 
els in one second) and the time unit being seconds. Converting 
and carrying out this subtraction, we have: 

At (s) Ax Ay A z 

3.7869120000 x 10 8 s 0.2121 x10 4 s 1.626 x 10 4 s 0.084 x 10 4 s 

Comparing the exponents of the temporal and spatial numbers, 
we can see that the spacecraft was moving at a velocity on the 
order of 1 0~ 4 of the speed of light, so relativistic effects should be 
small but not completely negligible. 

Since the interval is timelike, we can take its square root and 
interpret it as the time elapsed on the spacecraft. The result is 
t = 3.78691 1 996 x 1 0 8 s. This is 0.4 s less than the time elapsed 
in the sun’s frame of reference. 

1.3.3 The gamma factor 

Figure s is the relativistic version of example 1 on p. 14. We 
intend to analyze it using the metric, and since the metric gives 
the same result in any frame, we have chosen for convenience to 
represent it in the frame in which the earth is at rest. We have 
a = (t, 0 ) and b = (t,vx), where v is the velocity of the spaceship 
relative to the earth. Application of the metric gives proper time t 
for the earthbound twin and ty/l — v 2 for the traveling twin. The 
same results apply for c and d. The result is that the earthbound 
twin experiences a time that is greater by a factor 7 (Greek letter 
gamma) defined as 7 = 1 / 7 / 1 — v 2 . If v is close to c, 7 can be large, 
and we find that when the astronaut twin returns home, still youth- 
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s / The twin “paradox.” 


ful, the earthbound twin can be old and gray. This was at one time 
referred to as the twin paradox, and it was considered paradoxical 
either because it seemed to defy common sense or because the trav- 
eling twin could argue that she was the one at rest while the earth 
was moving. The violation of common sense is in fact what was ob- 
served in the Hafele-Keating experiment, and the latter argument 
is fallacious for the same reasons as in the Galilean version given in 
example 1 . 

We have in general the following interpretation: 


Time dilation 

A clock runs fastest in the frame of reference of an observer 
who is at rest relative to the clock. An observer in motion 
relative to the clock at speed v perceives the clock as running 
more slowly by a factor of 7 . 



t/A graph of y as a function 
of v. 


Although this is phrased in terms of clocks, we interpret it as 
telling us something about time itself. The attitude is that we should 
define a concept in terms of the operations required in order to mea- 
sure it: time is defined as what a clock measures. This philosophy, 
which has been immensely influential among physicists, is called 
operationalisnr and was developed by P.W. Bridgman in the 1920’s. 
Our operational definition of time works because the rates of all 
physical processes are affected equally by time dilation . 6 By the 
time the twins in figure s are reunited, not only has the traveling 
twin heard fewer ticks from her antique mechanical pocket watch, 
but she has also had fewer heartbeats, and the ship’s atomic clock 
agrees with her watch to within the precision of the watch. 


self-check A 

What is y when v = 0? What does this mean? Express the equation for 
y in SI units. > Answer, p. ?? 


Time dilation is symmetrical in the sense that it treats all frames 
of reference democratically. If observers A and B aren’t at rest 
relative to each other, then A says B’s time runs slow, but B says A 
is the slow one. In figure s, the laws of physics make no distinction 
between the frames of reference that coincide with vectors a and 
b; as in the corresponding Galilean case of example 1 on p. 14, the 
asymmetry comes about because a and c are parallel, but b and d 
are not. 

<J For more on this topic, see section 6.1. 
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As shown in example 8 below, consistency demands that in ad- 
dition to the effect on time we have a similar effect on distances: 


Length contraction 

A meter-stick appears longest to an observer who is at rest 
relative to it. An observer moving relative to the meter-stick 
at v observes the stick to be shortened by a factor of 7 . 


The visualization of length contraction in terms of spacetime dia- 
grams is presented in figure z/2 on p. 30. Our present discussion is 
limited to 1+1 dimensions, but in 3+1, only the length along the 
line of motion is contracted (ch. 2, problem 2, p. 51). 


An interstellar road trip Example 8 

Alice stays on earth while her twin Betty heads off in a spaceship 
for Tau Ceti, a nearby star. Tau Ceti is 12 light-years away, so 
even though Betty travels at 87% of the speed of light, it will take 
her a long time to get there: 14 years, according to Alice. 



Betty experiences time dilation. At this speed, her y is 2.0, so that 
the voyage will only seem to her to last 7 years. But there is per- 
fect symmetry between Alice’s and Betty’s frames of reference, 
so Betty agrees with Alice on their relative speed. (For more de- 
tail on this point, see example 11, p. 31.) Betty sees herself as 
being at rest, while the sun and Tau Ceti both move backward at 
87% of the speed of light. How, then, can she observe Tau Ceti 
to get to her in only 7 years, when it should take 1 4 years to travel 
12 light-years at this speed? 

We need to take into account length contraction. Betty sees the 
distance between the sun and Tau Ceti to be shrunk by a factor of 
2. The same thing occurs for Alice, who observes Betty and her 
spaceship to be foreshortened. 
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v/Time dilation measured 
with an atomic clock at low 
speeds. The theoretical curve, 
shown with a dashed line, is 
calculated from y = 1 /Vi - v 2 ; 
at these small velocities, the 
approximation y « 1 + v 2 /2 is 
excellent, and the graph is in- 
distinguishable from a parabola. 
This graph corresponds to an 
extreme close-up view of the 
lower left corner of figure t. The 
error bars on the experimental 
points are about the same size 
as the dots. 
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A moving atomic clock Example 9 

Expanding y in a Taylor series, we find y « 1 — i/ 2 /2, so that when 
v is small, relativistic effects are approximately proportional to v 2 , 
so it is very difficult to observe them at low speeds. This was 
the reason that the Hafele-Keating experiment was done aboard 
passenger jets, which fly at high speeds. Jets, however, fly at 
high altitude, and this brings in a second time dilation effect, a 
general-relativistic one due to gravity. The main purpose of the 
experiment was actually to test this effect. 

It was not until four decades after Hafele and Keating that anyone 
did a conceptually simple atomic clock experiment in which the 
only effect was motion, not gravity. In 2010, however, Chou etal. 7 
succeeded in building an atomic clock accurate enough to detect 
time dilation at speeds as low as 10 m/s. Figure v shows their 
results. Since it was not practical to move the entire clock, the 
experimenters only moved the aluminum atoms inside the clock 
that actually made it “tick.” 

Large time dilation Example 1 0 

The time dilation effects described in example 9 were very small. 
If we want to see a large time dilation effect, we can’t do it with 
something the size of the atomic clocks they used; the kinetic 
energy would be greater than the total megatonnage of all the 
world’s nuclear arsenals. We can, however, accelerate subatomic 
particles to speeds at which y is large. For experimental particle 
physicists, relativity is something you do all day before heading 
home and stopping off at the store for milk. An early, low-precision 
experiment of this kind was performed by Rossi and Hall in 1941 , 
using naturally occurring cosmic rays. Figure w shows a 1974 
experiment 8 of a similar type which verified the time dilation pre- 
dicted by relativity to a precision of about one part per thousand. 


Particles called muons (named after the Greek letter p, “myoo”) 
were produced by an accelerator at CERN, near Geneva. A muon 
is essentially a heavier version of the electron. Muons undergo 
radioactive decay, lasting an average of only 2.197 ps before they 
evaporate into an electron and two neutrinos. The 1974 experi- 
ment was actually built in order to measure the magnetic proper- 
ties of muons, but it produced a high-precision test of time dilation 
as a byproduct. Because muons have the same electric charge 
as electrons, they can be trapped using magnetic fields. Muons 
were injected into the ring shown in figure w, circling around it un- 
til they underwent radioactive decay. At the speed at which these 
muons were traveling, they had y = 29.33, so on the average they 
lasted 29.33 times longer than the normal lifetime. In other words, 

7 Science 329 (2010) 1630 

8 Bailey at al., Nucl. Phys. B150(1979) 1 
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w / Left: Apparatus used for the test of relativistic time dilation described in example 10. The prominent 
black and white blocks are large magnets surrounding a circular pipe with a vacuum inside, (c) 1974 by CERN. 
Right'. Muons accelerated to nearly c undergo radioactive decay much more slowly than they would according 
to an observer at rest with respect to the muons. The first two data-points (unfilled circles) were subject to 
large systematic errors. 


they were like tiny alarm clocks that self-destructed at a randomly 
selected time. The graph shows the number of radioactive decays 
counted, as a function of the time elapsed after a given stream of 
muons was injected into the storage ring. The two dashed lines 
show the rates of decay predicted with and without relativity. The 
relativistic line is the one that agrees with experiment. 
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x / Two events are given as points 
on a graph of position versus 
time. Joan of Arc helps to re- 
store Charles VII to the throne. At 
a later time and a different posi- 
tion, Joan of Arc is sentenced to 
death. 
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y / The Lorentz transformation. 

z / 1. The clock is at rest in the 
original frame of reference, and 
it measures a time interval t. In 
the new frame of reference, the 
time interval is greater by a fac- 
tor of y. 2. The ruler is moving in 
the first frame, represented by a 
square, but at rest in the second 
one, shown as a parallelogram. 
Each picture of the ruler is a snap- 
shot taken at a certain moment 
as judged according to the sec- 
ond frame’s notion of simultaneity. 
An observer in first frame judges 
the ruler’s length instead accord- 
ing to that frame’s definition of si- 
multaneity, i.e., using points that 
are lined up horizontally on the 
graph. The ruler appears shorter 
in the frame in which it is mov- 
ing. 



x x 


1.4 The Lorentz transformation 

Philosophically, coordinates are unnecessary, but they are conve- 
nient. They are arbitrary, so we can change from one set to an- 
other. For example, we can change the units used to measure time 
and position, as in the first and second panels of figure x. Nothing 
changes about the underlying events; only the labels are different. 
The third panel shows a convenient convention we will use to depict 
such changes visually. The gray rectangle represents the original 
grid from the first panel, while the grid of black lines represents the 
new version from the second panel. Omitting the grid from the gray 
rectangle makes the diagram easier to decode visually. 

In special relativity it is of interest to convert between the Min- 
kowski coordinates of observers who are in motion relative to one 
another. The result, shown in figure y, is a kind of stretching and 
smooshing of the diagonals. Since the area is invariant, one diagonal 
grows by the same factor by which the other shrinks. This change 
of coordinates is called the Lorentz transformation. 



L 


Figure z shows how time dilation and length contraction come 
about in this picture. It should be emphasized here that the Lorentz 
transformation includes more effects than just length contraction 
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and time dilation. Many beginners at relativity get confused and 
come to erroneous conclusions by trying to reduce everything to a 
matter of inserting factors of 7 in various equations. If the Lorentz 
transformation amounted to nothing more than length contraction 
and time dilation, it would be merely a change of units like the one 
shown in figure x. 

The Lorentz transformation can be notated algebraically: 
t' = yt — v'yx 

, ( 1 ) 

x = —vqt + yx 

The fact that this is the correct relativistic transformation can be 
verified by noting that ( 1 ) the speed-of-light lines x = ±t are pre- 
served, and ( 2 ) the determinant equals 1 , so that areas are preserved. 
Alternatively, it is sufficient to check the invariance of the spacetime 
interval under this transformation. 

Equations (1) treat space and time in a perfectly symmetric 
way, but this should not be taken as implying that special relativity 
perfectly embodies such a symmetry. For example, I can easily 
revisit a place that I’ve been to before, but I can’t go back in time. 
And of course we have three dimensions of space; our use of 1+1 
dimensions rather than 3+1 is just a matter of convenience for the 
moment. Note also that there is no exact analogy between figure 
z/ 1 , where the clock is a pointlike object tracing a line through 
spacetime, and z/ 2 , where the ruler is an extended body that sweeps 
out a parallel-sided ribbon. 

Observers agree on their relative speeds Example 1 1 

Observer A says observer B is moving away from her at velocity 
v; is it true, as in Galilean relativity, that B measures the same 
speed for A? Yes, it is true, but not completely obvious. One way 
to verify this fact is to check that Lorentz transformations with ve- 
locities v and - v are inverses. A more physically transparent jus- 
tification is shown in figure aa. In aa/1, A determines B’s velocity 
relative to her by sending out two round-trip signals at the speed 
of light, and measuring the difference between the two round-trip 
times. Because space is the same in all directions, 9 the exper- 
imental data are exactly the same when B carries out the mea- 
surement, aa/2, and therefore B infers the same speed. 

Motion in the opposite direction Example 12 

Figure ab shows the case where the observer whose frame is 
represented by the grid is moving to the left relative to the one 
whose frame is represented by the gray square. 

Other quadrants Example 13 

So far I’ve been arbitrarily choosing to draw only the first quad- 
rant of each coordinate system. Figure ac shows a region that 
includes all four quadrants. 

9 This is discussed in more detail on p. 46 
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aa / Example 1 1 . 



ab / Example 12. 



ac / Example 13. 
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A numerical example of invariance Example 14 

Figure ad shows two frames of reference in motion relative to 
one another at v = 3/5. (For this velocity, the stretching and 
squishing of the main diagonals are both by a factor of 2.) Events 
are marked at coordinates that in the frame represented by the 
square are 


(f, x) = ( 0,0) and 
(f,x) = (13,11). 


The interval between these events is 1 3 2 — 1 1 2 = 48. In the 
frame represented by the parallelogram, the same two events lie 
at coordinates 


(t',x') = ( 0,0) and 

(f / ,x / ) = (8,4). 


Calculating the interval using these values, the result is 
8 2 - 4 2 = 48, which comes out the same as in the other frame. 
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ae / Example 15: In the garage’s frame of reference, the bus is moving, and can fit in the garage due 
to its length contraction. In the bus’s frame of reference, the garage is moving, and can’t hold the bus due to its 
length contraction. 


The garage paradox Example 15 

One of the most famous of all the so-called relativity paradoxes 
has to do with our incorrect feeling that simultaneity is well de- 
fined. The idea is that one could take a schoolbus and drive it at 
relativistic speeds into a garage of ordinary size, in which it nor- 
mally would not fit. Because of the length contraction, the bus 
would supposedly fit in the garage. The driver, however, will per- 
ceive the garage as being contracted and thus even less able to 
contain the bus. 

The paradox is resolved when we recognize that the concept of 
fitting the bus in the garage “all at once” contains a hidden as- 
sumption, the assumption that it makes sense to ask whether the 
front and back of the bus can simultaneously be in the garage. 
Observers in different frames of reference moving at high relative 
speeds do not necessarily agree on whether things happen si- 
multaneously. As shown in figure ae, the person in the garage’s 
frame can shut the door at an instant B he perceives to be si- 
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multaneous with the front bumper’s arrival A at the back wall of 
the garage, but the driver would not agree about the simultaneity 
of these two events, and would perceive the door as having shut 
long after she plowed through the back wall. 



af / Example 16. 



Shifting clocks Example 16 

The top row of clocks in the figure are located in three different 
places. They have been synchronized in the frame of reference 
of the earth, represented by the paper. This synchronization is 
carried out by exchanging light signals (Einstein synchronization), 
as in example 4 on p. 1 8. For example, if the front and back clocks 
both send out flashes of light when they think it’s 2 o’clock, the 
one in the middle will receive them both at the same time. Event 
A is the one at which the back clock A reads 2 o’clock, etc. 

The bottom row of clocks are aboard the train, and have been 
synchronized in a similar way. For the reasons discussed in ex- 
ample 4, their synchronization differs from that of the earth-based 
clocks. By referring to the diagram of the Lorentz transformation 
shown on the right, we see that in the frame of the train, 2, C 
happens first, then B, then A. 

This is an example of the interpretation of the term t' = ... - vyx 
in the Lorentz transformation (eq. (1), p. 31). Because the events 
occur at different x’s, each is shifted in time relative to the next, 
according to clocks synchronized in frame 2 (t\ the train). 

Deja vu? Example 17 

The grids we’ve been drawing are mere conventions, as elaborate 
and arbitrary as dress in the court of Louis XIV. They come from 
a surveying process, which may need to be planned in advance 
and whose results we might not be able to see until later. 

The dark line in figure ag is the world-line of an observer O, 
who moves inertially for a while, accelerates to the left, and then 
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moves inertially again. On the right are pasted two coordinate 
grids adapted to the two inertial segments. At event E, Bush 
steals the 2000 election, and this is depicted as being simulta- 
neous with both event A and event B. Does poor O see it happen 
twice? No, even if the bad news is transmitted by a signal mov- 
ing at the speed of light (dashed line), O receives it only once, at 
event C. 

The only problem here is a poor choice of labels, which causes 
E to have more than one label. Something similar happens in a 
constant-acceleration frame, section 7.1, p. 143. Cf. also p. 73. 


Many mistakes by beginners at relativity revolve around a set of 
unexamined preconceptions about what it means to observe things. 
One imagines that effects such as length contraction and time di- 
lation are what an observer actually sees, and perhaps that this 
process of seeing is instantaneous. Or one thinks of Minkowski co- 
ordinates as if they were the result of a simple and automatic process 
of perception by an observer. This is the kind of thinking that will 
lead one to believe that example 17 is crazy or paradoxical. 

As another example, it should not be imagined that the length 
contraction of a stick by l/y is what an observer actually sees when 
looking at the stick. Optical observations are influenced, for exam- 
ple, by the unequal times taken for light to propagate from the ends 
of the stick to the eye. A simulation of this type of effect is drawn 
in example 7 on p. 135. 

Length contraction, time dilation, the observer-dependence of si- 
multaneity, and Minkowski coordinates are all sophisticated results 
of a laborious process of collecting and analyzing data obtained by 
techniques such as Einstein synchronization, which require actions 
such as consulting atomic clocks or exchanging signals between dif- 
ferent points at the speed of light. Figure ah outlines such a process 
in a cartoonish way. A fleet of rocket ships, carrying surveyors, is 
sent out from Earth and dispersed throughout a vast region of space. 
Surveyors look through their theodolites at images, which are formed 
by light rays (dashed lines) that have arrived after traveling at the 
finite speed c. Such light rays carry old, stale information about 
various events. A nuclear war has broken out. Rock and roll music 
has arrived on Saturn. The resulting data are then transmitted by 
various means (passenger pigeon, Morse-coded radio, paper mail) 
and consolidated at the surveying office. At the office, workers at 
a long row of desks crunch the numbers and produce a chart of 
Minkowski coordinates with the events marked in. 



ah / Minkowski coordinates 
are the result of a complicated 
process of surveying. 
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ai / The twin paradox, inter- 
preted as a triangle inequality. 


1.5 ★Triangle and Cauchy-Schwarz 
inequalities 

In Euclidean geometry, we have the intuitively obvious fact that 
any side of a triangle is no greater than the sum of the other two 
sides. This can be written in terms of vectors as |m + n| < |m| + |n|. 
Closely related to it is the inequality |m • n| < |m| |n|, known as 
the Cauchy-Schwarz inequality, which can be seen because m • n = 
|m| |n| cos0, where 9 is the angle between the two vectors. 

Any proof of these facts ultimately depends on the assumption 
that the metric has the Euclidean signature + + + (or on equiv- 
alent assumptions such as Euclid’s axioms). Figure ai shows that 
on physical grounds, we do not expect the inequalities to hold for 
Minkowski vectors in their unmodified Euclidean forms. The quan- 
tity |m + n| represents the proper time of the spaceship that moved 
inertially along with the earth, while |m| + |n| is the greater proper 
time of the traveling spaceship. 

On the other hand, Minkowski space has copies of Euclidean 
space built in. For example, we know that all the familiar Euclidean 
facts must hold in any plane of simultaneity defined by a particular 
observer at a given moment in time, since the restriction of the 

metric to that plane has signature , and the distinction between 

this and the + + + signature is an arbitrary notational convention. 

Summarizing these observations, we expect that the relativistic 
version of the triangle and Cauchy-Schwarz inequalities will be split 
into cases, some of which are the same as the Euclidean case and 
some of them different. 

Some notational issues may be confusing in the following discus- 
sion. We let a 2 mean a • a, which may not be positive, while |a| 
indicates the positive real number -\/|a • a|. I will try to specifically 

point out any equations that are only true for H signature and 

not for — b ++, and express important final results in a way that 
doesn’t depend on this choice. 

1 .5.1 Two timelike vectors 

A simple and important case is the one in which both m and n 
trace possible world-lines of material objects, as in figure ai. That 
is, they must both be timelike vectors. To see what form of the 
Cauchy-Schwarz inequality should hold, we break the vector n down 
into two parts, n = ny + nj_, where ny is parallel to m and nj_ 
perpendicular. We then have |m • n| = |m • ny| = |m| |n|||. But 
n 2 = (ri|| + nj_) 2 = n| + 2n|j • nj_ + = n| + n^_, and since ny 

is timelike and spacelike, we have (in the H signature) 

n| > 0 and < 0. Therefore, regardless of signature, |n| < |ny|, 
and we have the reversed Cauchy-Schwarz inequality 

|m • n| > |m| |n| (valid for either -| or — |- +- (-). 
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A useful way of interpreting the reversal compared to the Euclidean 
case is that if the vectors happen to be normalized such that |m| = 
|n| = 1, then m • n = 7 , where 7 is the Lorentz factor for an 
observer whose world-line is parallel to m with respect to a world- 
line parallel to n. The difference from the Euclidean behavior can 
then be understood as arising from the fact that whereas | cos 6\ < 1 , 
we always have 7 > 1 . 

Given the physical motivation presented so far, it would have 
been natural to take both m and n to lie in the future rather than the 
past light cone, but we have not yet assumed that this was the case, 
and the reversed Cauchy- Schwarz inequality holds independently of 
such an assumption. (See problem 16 for an alternative way of see- 
ing this.) In order to discuss the related triangle inequality, however, 
we will need to assume that both vectors are future-directed. Phys- 
ically, this is necessary in order to give the interpretation shown 
in figure ai, from which we have already inferred that the triangle 
inequality must be reversed. To verify this mathematically, we can 
compute the difference (m + n ) 2 — (|m| + |n |) 2 (problem 17). 

An application to collisions is given in section 4.3.2, p. 89. 

1.5.2 Two spacelike vectors not spanning the light cone 

Now suppose that m and n are both spacelike, and the plane that 
they span does not include the light-cone. Operating within this 
plane, we never get any timelike or lightlike vectors, and therefore 
the non-Euclidean nature of the metric is never apparent to us. 
The geometry of this plane is therefore Euclidean, so in this case 
the ordinary Euclidean versions of the Cauchy-Schwarz and triangle 
inequalities must hold. 

No relativity required Example 18 

Suppose that a certain observer establishes Minkowski coordi- 
nates, and consider the unit vectors x and y lying along the x 
and y axes. The x-y plane that they span does not include the 
light cone. By plugging in to the Minkowski-coordinate form of the 
metric, we find that x • y = 0, as expected since the geometry of 
the x-y plane is Euclidean. This satisfies the ordinary form of the 
Cauchy-Schwarz inequality. 

1.5.3 Two spacelike vectors spanning the light cone 

Now consider the case, in Minkowski coordinates, where m = 
(0,5, 0,0) and n = (4, 5,0,0). These vectors span the t-x plane, 
whose geometry is not Euclidean, and they do not satisfy the Eu- 
clidean Cauchy-Schwarz inequality, since m • n = —25, whereas 
|m| |n| = 15. Two vectors of this type will always satisfy the re- 
versed version of the Cauchy-Schwarz inequality (problem 18). The 
converse holds in the sense that if two spacelike vectors satisfy the 
strict inequality |m • n| > |m| |n|, then they span the light cone. 
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Problem 5. 


1 Astronauts in three different spaceships are communicating 
with each other. Those aboard ships A and B agree on the rate at 
which time is passing, but they disagree with the ones on ship C. 

(a) Alice is aboard ship A. How does she describe the motion of her 
own ship, in its frame of reference? 

(b) Describe the motion of the other two ships according to Alice. 

(c) Give the description according to Betty, whose frame of reference 
is ship B. 

(d) Do the same for Cathy, aboard ship C. 

2 What happens in the equation for 7 when you put in a 
negative number for vl Explain what this means physically, and 
why it makes sense. 

3 The Voyager 1 space probe, launched in 1977, is moving faster 
relative to the earth than any other human-made object, at 17,000 
meters per second. 

(a) Calculate the probe’s 7 . 

(b) Over the course of one year on earth, slightly less than one year 

passes on the probe. How much less? (There are 31 million seconds 
in a year.) V 

4 The earth is orbiting the sun, and therefore is contracted 

relativistically in the direction of its motion. Compute the amount 
by which its diameter shrinks in this direction. V 

5 The figure shows seven displacement vectors in spacetime. 
Which of these represent spacetime intervals that are equal to one 
another? 

6 (a) In Euclidean geometry in three dimensions, suppose we 
have two vectors, a and b, which are unit vectors, i.e., a - a = 1 and 
b • b = 1. What is the range of possible values for the inner product 

ab? 

(b) Repeat part a for two timelike, future-directed unit vectors in 
3+1 dimensions. 

7 Expressed in natural units, the Lorentz transformation is 

t' = 7 1 — v'yx 
x' = —vyt + 'yx. 

(a) Insert factors of c to make it valid in units where c ^ 1. (b) Show 
that in the limit c — > 00 , these have the right Galilean behavior. 

8 This problem assumes you have some basic knowledge of quan- 
tum physics. One way of expressing the correspondence principle as 
applied to special relativity is that in the limit c — > 00 , all relativis- 
tic expressions have to go over to their Galilean counterparts. What 
would be the corresponding limit if we wanted to recover classical 
mechanics from quantum mechanics? 
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9 In 3 + 1 dimensions, prove that if u and v are nonzero, 
future-lightlike, and not parallel to each other, then their sum is 
future-timelike. 

10 Prove that if u and v are nonzero, lightlike, and orthogonal 
to each other, then they are parallel, i.e., u = cv for some c / 0. 

11 The speed at which a disturbance travels along a string 
under tension is given by v = \JT / /i , where p is the mass per unit 
length, and T is the tension. 

(a) Suppose a string has a density p, and a cross-sectional area A. 

Find an expression for the maximum tension that could possibly 
exist in the string without producing v > c, which is impossible 
according to relativity. Express your answer in terms of p. A, and 
c. The interpretation is that relativity puts a limit on how strong 
any material can be. v 

(b) Every substance has a tensile strength, defined as the force 

per unit area required to break it by pulling it apart. The ten- 
sile strength is measured in units of N/ m 2 , which is the same as the 
pascal (Pa), the mks unit of pressure. Make a numerical estimate 
of the maximum tensile strength allowed by relativity in the case 
where the rope is made out of ordinary matter, with a density on 
the same order of magnitude as that of water. (For comparison, 
kevlar has a tensile strength of about 4 x 10 9 Pa, and there is spec- 
ulation that fibers made from carbon nanotubes could have values 
as high as 6 x 10 10 Pa.) V 

(c) A black hole is a star that has collapsed and become very dense, 
so that its gravity is too strong for anything ever to escape from it. 
For instance, the escape velocity from a black hole is greater than 
c, so a projectile can’t be shot out of it. Many people, when they 
hear this description of a black hole in terms of an escape velocity 
greater than c, wonder why it still wouldn’t be possible to extract 
an object from a black hole by other means than launching it out 
as a projectile. For example, suppose we lower an astronaut into a 
black hole on a rope, and then pull him back out again. Why might 
this not work? 
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12 The rod in the figure is perfectly rigid. At event A, the 
hammer strikes one end of the rod. At event B, the other end moves. 
Since the rod is perfectly rigid, it can’t compress, so A and B are 
simultaneous. In frame 2, B happens before A. Did the motion at 
the right end cause the person on the left to decide to pick up the 
hammer and use it? 


Problem 12. 



13 Use a spacetime diagram to resolve the following relativity 
paradox. Relativity says that in one frame of reference, event A 
could happen before event B, but in someone else’s frame B would 
come before A. How can this be? Obviously the two people could 
meet up at A and talk as they cruised past each other. Wouldn’t 
they have to agree on whether B had already happened? 

14 The grid represents spacetime in a certain frame of reference. 
Event A is marked with a dot. Mark additional points satisfying the 
following criteria. (Pick points that lie at the intersections of the 
gridlines.) 

Point B is at the same location as A in this frame of reference, and 
lies in its future. 

C is also in point A’s future, is not at the same location as A in 
this frame, but is in the same location as A according to some other 
frame of reference. 

D is simultaneous with A in this frame of reference. 

E is not simultaneous with A in this frame of reference, but is si- 
Problem 14. multaneous with it according to some other frame. 

F lies in A’s past according to this frame of reference, but could not 
have caused A. 

G lies in A’s future according to this frame of reference, but is in its 
past according to some other frames. 

H lies in A’s future according to any frame of reference, not just this 
one. 
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I is the departure of a spaceship, which arrives at A. 

J could have caused A, but could not have been the departure of a 
spaceship like I that arrived later at A. 

15 (a) Given an observer whose world-line is along a four-vector 
O, suppose we want to determine whether some other four- vector 
P is also a possible world-line of an observer. Show that knowledge 
of the signs of the inner products O • P and P • P is necessary and 
sufficient to determine this. 

(b) Suppose that U and V are both observer- vectors. What would 
it mean physically to compute U + V? 

(c) For vectors as described in part b, determine the signs of 

(U + V) • (U + V) 

and 

(U + V) -u 

by multiplying them out. Interpret the result physically. 

16 In section 1.5.1, we proved the reversed Cauchy-Schwarz 
inequality for two timelike vectors, without any assumption as to 
whether they lay in the future or past light cones. But suppose that 
we had only established this fact for two vectors that were both 
future-directed. Show that the same inequality would then also have 
to hold regardless of whether one or both vectors was past-directed. 

17 In the case of two future-directed timelike vectors, com- 
plete the proof of the reversed triangle inequality using the method 
suggested in section 1.5.1. 

18 In section 1.5.3 we claimed that the reversed Cauchy-Schwarz 
inequality holds for two spacelike vectors that span the light cone. 
The purpose of this problem is to prove this fact using the following 
sketch of an argument provided by PhysicsForums user martinbn. 
Suppose that spacelike vectors m and n span the light cone, so that 
we can find some real number a such that p = am + n is lightlike. 
Compute p 2 , and show that since a is real, the reversed Cauchy- 
Schwarz inequality holds. 

19 A length-contracted object has length L = L 0 / 7 . Joe differ- 
entiates this with respect to time and finds d L/ d t = — Lov'ydv/ dt. 
He reasons that there is no upper limit on the magnitude of dv/ dt, 
and therefore if v 7 ^ 0 the quantity dL/df can be arbitrarily large. 
This means that if an object accelerates away from an observer, its 
trailing edge can have v > c, which is supposed to be forbidden by 
relativity. OMG! Is Joe’s reasoning correct? 


Problems 


41 



42 


Chapter 1 Spacetime 



Chapter 2 

Foundations (optional) 


In this optional chapter we more systematically examine the foun- 
dational assumptions of special relativity, which were appealed to 
casually in chapter 1. Most readers will want to skip this chapter 
and move on to ch. 3. The ordering of chapters 1 and 2 may seem 
backwards, but many of the issues to be raised here are very subtle 
and hard to appreciate without already understanding something 
about special relativity — in fact, Einstein and other relativists did 
not understand them properly until decades after the introduction 
of special relativity in 1905. 


2.1 Causality 

2.1.1 The arrow of time 

Our intuitive belief in cause-and-effect mechanisms is not sup- 
ported in any clearcut way by the laws of physics as currently un- 
derstood. For example, we feel that the past affects the future but 
not the other way around, but this feeling doesn’t seem to translate 
into physical law. For example, Newton’s laws are invariant un- 
der time reversal, figure a, as are Maxwell’s equations. (The weak 
nuclear force is the only part of the standard model that violates 
time-reversal symmetry, and even it is invariant under the CPT 
transformation.) 

There is an arrow of time provided by the second law of thermo- 
dynamics, and this arises ultimately from the fact that, for reasons 
unknown to us, the universe soon after the Big Bang was in a state 
of extremely low entropy. 1 

2.1.2 Initial-value problems 

So rather than depending on the arrow of time, we may be better 
off formulating a notion of causality based on existence and unique- 
ness of initial-value problems. In 1776, Laplace gave an influential 
early formulation of this idea in the context of Newtonian mechan- 
ics: “Given for one instant an intelligence which could comprehend 
all the forces by which nature is animated and the respective posi- 

1 One can find a vast amount of nonsense written about this, such as claims 
that the second law is derivable without reference to any cosmological con- 
text. For a careful treatment, see Callender, “Thermodynamic Asymmetry 
in Time,” The Stanford Encyclopedia of Philosophy, plato.stanford.edu/ 
archives/f all2011/ entries/time- thermo. 



a / Newton’s laws do not dis- 
tinguish past from future. The 
football could travel in either 
direction while obeying Newton’s 
laws. 
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tions of the things which compose it . . . nothing would be uncertain, 
and the future as the past would be laid out before its eyes.” The 
reference to “one instant” is not compatible with special relativity, 
which has no frame- independent definition of simultaneity. We can, 
however, define initial conditions on some spacelike three-surface, 
i.e. , a three-dimensional set of events that is smooth, has the topol- 
ogy of Euclidean space, and whose events are spacelike in relation 
to one another. 

Unfortunately it is not obvious whether the classical laws of 
physics satisfy Laplace’s definition of causality. Two interesting 
and accessible papers that express a skeptical view on this issue are 
Norton, “Causation as Folk Science,” philsci-archive.pitt.edu/ 
1214; and Echeverria et al . , “Billiard balls in wormhole spacetimes 
with closed timelike curves: Classical theory,” http : //resolver . 
caltech. edu/CaltechAUTHORS : ECHprd91. The Norton paper in 
particular has generated a large literature at the interface between 
physics and philosophy, and one can find most of the relevant ma- 
terial online using the keywords “Norton’s dome.” 

Nor does general relativity offer much support to the Laplacian 
version of causality. For example, general relativity says that given 
generic initial conditions, gravitational collapse leads to the forma- 
tion of singularities, points where the structure of spacetime breaks 
down and various measurable quantities become infinite. Singu- 
larities typically violate causality, since the laws of physics can’t 
describe them. In a famous image, John Earman wrote that if we 
have a certain type of singularity (called a naked singularity), “all 
sorts of nasty things . . . emerge helter-skelter . . . ,” including “TV 
sets showing Nixon’s ‘Checkers’ speech, green slime, Japanese horror 
movie monsters, etc.” 

2.1.3 A modest definition of causality 

Since there does not seem to be any reason to expect causality 
to hold in any grand sense, we will content ourselves here with a 
very modest and specialized definition, stated as a postulate, that 
works well enough for special relativity. 

PI. Causality. There exist events 1 and 2 such that the dis- 
placement vector Ari 2 is timelike in all frames. 

This is sufficient to rule out the “rotational” version of the 
Lorentz transformation shown in figure j on p. 16. If PI were vi- 
olated, then we could never describe one event as causing another, 
since there would always be frames of reference in which the effect 
was observed as preceding the cause. 
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2.2 Flatness 

2.2.1 Failure of parallelism 

In postulate PI we implicitly assumed that given two points, 
there was a certain vector connecting them. This is analogous to 
the Euclidean postulate that two points define a line. 

For insight, let’s think about how the Euclidean version of this 
assumption could fail. Euclidean geometry is only an approximate 
description of the earth’s surface, for example, and this is why flat 
maps always entail distortions of the actual shapes. The distortions 
might be negligible on a map of Connecticut, but severe for a map 
of the whole world. That is, the globe is only locally Euclidean. 
On a spherical surface, the appropriate object to play the role of a 
“line” is a great circle, figure b. The lines of longitude are examples 
of great circles, and since these all coincide at the poles, we can see 
that two points do not determine a line in noneuclidean geometry. 

A two-dimensional bug living on the surface of a sphere would 
not be able to tell that the sphere was embedded in a third dimen- 
sion, but it could still detect the curvature of the surface. It could 
tell that Euclid’s postulates were false on large distance scales. A 
method that has a better analog in spacetime is shown in figure 
c: transporting a vector from one point to another depends on the 
path along which it was transported. This effect is our definition of 
curvature. 

2.2.2 Parallel transport 

The particular type of transport that we have in mind here is 
called parallel transport. When I walk from the living room to 
the kitchen while carrying a mechanical gyroscope, I’m parallel- 
transporting the spacelike vector indicated by the direction of its 
axis. Figure d shows that parallel transport can also be defined for 
timelike vectors, and that parallel transport can be defined in space- 
time using only inertial motion, clocks, and intersection of world- 
lines. Observers aboard the two spaceships use clocks in order to 
verify the parallelism of their world- lines (vectors AB and CD, which 
have equal lengths as measured by the proper time elapsed aboard 
the ships). Observer AB shoots clocks to observer CD, and the 
clocks are set up so that when they pass by one another, they auto- 
matically record one another’s readings. The vectors are parallel if 
the record later reveals AD and BC intersected at their midpoints, 
as measured by the proper times recorded on the clocks. 



b/An airplane flying from 
Mexico City to London follows 
the shortest path, which is a 
segment of a great circle. A path 
of extremal length between two 
points is called a geodesic. 



c / Transporting the vector 
along path AC gives a different 
result than doing it along the path 
ABC. 



d / Parallel transport. 
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2.2.3 Special relativity requires flat spacetime 

Hidden in a number of spots in chapter 1 was the following 
assumption. 

P2. Flatness of spacetime. Parallel-transporting a vector from 
one point to another gives a result that is independent of the path 
along which it was transported. 

For example, when we established the form of the metric in sec- 
tion 1.3.2, we used the fact, proved on p. 49, that area is a scalar, 
but that proof depends on P2. 

Property P2 is only approximately true, as shown explicitly by 
the Gravity Probe B satellite, launched in 2004. The probe carried 
four gyroscopes made of quartz, which were the most perfect spheres 
ever manufactured, varying from sphericity by no more than about 
40 atoms. After one year and about 5000 orbits around the earth, 
the gyroscopes were found to have changed their orientations relative 
to the distant stars by about 3 x ICG 6 radians (figure e). This is a 
violation of P2, but oue that was very small and difficult to detect. 
The result was in good agreement with the predictions of general 
relativity, which describes gravity as a curvature of spacetime. The 
smallness of the effect tells us that the earth’s gravitational field 
is not so large as to completely invalidate special relativity as a 
description of the nearby region of spacetime. One of the basic 
assumptions of general relativity is that in a small enough region 
of spacetime, it is always a good approximation to assume P2, so 
that general relativity is locally the same as special relativity. In 
the Gravity Probe B experiment, the effect was small and hard to 
detect, and this was the reason for letting the effect accumulate 
over a large number of orbits, spanning a large region of spacetime. 
Problem 5 on p. 52 investigates more quantitatively how the size of 
curvature effects varies with the size of the region. 


2.3 Additional postulates 

We make the following additional assumptions: 

P3 Spacetime is homogeneous and isotropic. No time or place 
has special properties that make it distinguishable from other 
points, nor is one direction in space distinguishable from an- 
other. 2 

P4 Inertial frames of reference exist. These are frames in which 
particles move at constant velocity if not subject to any forces. 3 

2 For the experimental evidence on isotropy, see http://www. 
edu- observatory . org/physics-f aq/Relativity/SR/experiments .html\ 
#Tests_of _isotropy_of _space. 

3 Defining this no-force rule turns out to be tricky when it comes to gravity. As 


46 


Chapter 2 


Foundations (optional) 





e / Precession angle as a function of time as measured by the four gyroscopes aboard Gravity Probe B. 


We can construct such a frame by using a particular particle, 
which is not subject to any forces, as a reference point. Inertial 
motion is modeled by vectors and parallelism. 

P5 Equivalence of inertial frames: If a frame is in constant-velocity 
translational motion relative to an inertial frame, then it is also 
an inertial frame. No experiment can distinguish one preferred 
inertial frame from all the others. 

P6 Relativity of time: There exist events 1 and 2 and frames of 
reference defined by observers o and o' such that o _L r \2 is 
true but o' _L v \2 is false, where the notation olr means that 
observer o finds r to be a vector of simultaneity according to 
some convenient criterion such as Einstein synchronization. 4 


Postulates P3 and P5 describe symmetries of spacetime, while 
P6 differentiates the spacetime of special relativity from Galilean 
spacetime; the symmetry described by these three postulates is re- 
ferred to as Lorentz invariance, and all known physical laws have 
this symmetry. Postulate P4 defines what we have meant when we 
referred to the parallelism of vectors in spacetime (e.g., in figure 
s on p. 26). Postulates P1-P6 were all the assumptions that were 
needed in order to arrive at the picture of spacetime described in 
ch. 1. This approach, based on symmetries, dates back to 1911. 5 

Surprisingly, it is possible for space or spacetime to obey our 
flatness postulate P2 while nevertheless having a nontrivial topology , 

discussed in ch. 5, this apparently minor technicality turns out to have important 
consequences. 

4 example 4, p. 18 

5 W. v. Ignatowsky, Phys. Zeits. 11 (1911) 972. English trans- 

lation at en. wikisource . org/wiki/Translation: Some_General_Remarks_on_ 
the_Relativity_Principle 
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such as that of a cylinder or a Mobius strip (cf. problem 4, p. 51, 
and sec. 7.6.2, p. 154). Many authors prefer to explicitly rule out 
such possibilities as part of their definition of special relativity. 

2.4 Other axiomatizations 

2.4.1 Einstein’s postulates 

Einstein used a different axiomatization in his 1905 paper on 
special relativity: 6 

El. Principle of relativity: The laws of electrodynamics and 
optics are valid for all frames of reference for which the equations of 
mechanics hold good. 

E2. Light is always propagated in empty space with a definite 
velocity c which is independent of the state of motion of the emitting 
body. 

These should be supplemented with our P2 and P3. 

Einstein’s approach has been slavishly followed in many later 
textbook presentations, even though the special role it assigns to 
light is not consistent with how modern physicists think about the 
fundamental structure of the laws of physics. (In 1905 there was 
no other phenomenon known to travel at c.) Einstein did not ex- 
plicitly state anything like our P2 (flatness), since he had not yet 
developed the theory of general relativity or the idea of representing 
gravity in relativity as spacetime curvature. When he did publish 
the general theory, he described the distinction between special and 
general relativity as a generalization of the class of acceptable frames 
of reference to include accelerated as well as inertial frames. This 
description has not stood the test of time, and today relativists 
use flatness as the distinguishing criterion. In particular, it is not 
true, as one sometimes still hears claimed, that special relativity is 
incompatible with accelerated frames of reference. 

2.4.2 Maximal time 

Another approach, presented, e.g., by Laurent, 7 combines our 
P2 with the following: 


T1 Metric: An inner product exists. Proper time is measured by 
the square root of the inner product of a world-line with itself. 

T2 Maximum proper time: Inertial motion gives a world-line along 
which the proper time is at a maximum with respect to small 
changes in the world-line. Inertial motion is modeled by vec- 
tors and parallelism, and this vector-space apparatus has the 

(> Paraphrased from the translation by W. Perrett and G.B. Jeffery. 

1 Bertel Laurent, Introduction to Spacetime: A First Course on Relativity 
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usual algebraic properties in relation to the inner product re- 
ferred to in Tl, e.g., a • (b + c) = a • b + a • c. 


We have already seen an example of T2 in our analysis of the 
twin paradox (figure s on p. 26). Conceptually, T2 is similar to 
defining a line as the shortest path between two points, except that 

we define a geodesic as being the longest one (four our H 

signature) . 

2.4.3 Comparison of the systems 

It is useful to compare the axiomatizations P, E, and T from 
sections 2. 1.1-2. 4. 2 with each other in order to gain insight into how 
much “wiggle room” there is in constructing theories of spacetime. 
Since they are logically equivalent, any statement occurring in one 
axiomatization can be proved as a theorem in any one of the others. 

For example, we might wonder whether it is possible to equip 
Galilean spacetime with a metric. The answer is no, since a system 
with a metric would satisfy the axioms of system T, which are log- 
ically equivalent to our system P. The underlying reason for this is 
that in Galilean spacetime there is no natural way to compare the 
scales of distance and time. 

Or we could ask whether it is possible to compose variations on 
the theme of special relativity, alternative theories whose properties 
differ in some way. System P shows that this would be unlikely to 
succeed without violating the symmetry of spacetime. 

Another interesting example is Amelino-Camelia’s doubly-special 
relativity, 8 in which we have both an invariant speed c and an invari- 
ant length L, which is assumed to be the Planck length \JUG jc? . 
The invariance of this length contradicts the existence of length 
contraction. In order to make his theory work, Amelino-Camelia is 
obliged to assume that energy-momentum vectors (section 4.3) have 
their own special inner product that violates the algebraic properties 
referred to in T2. 

2.5 Lemma: spacetime area is invariant 

In this section we prove from axioms P1-P6 that area in the x — t 
plane is invariant, i.e. , it does not change between frames of refer- 
ence. This result was used in section 1.3.2 to find the form of the 
spacetime metric. 

Consider figure f. Vectors oi and si are orthogonal and have 
equal lengths as measured by a clock and a ruler (which are cali- 
brated in units such that c = 1, e.g., seconds and light-seconds). The 
square lattice of white polka-dots is obtained from them by repeated 
addition. By assuming that this lattice construction is possible, we 

8 arxiv . org/ abs/gr-qc/00 12051 



f / Area is a scalar. 
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are implicitly assuming postulate P2, flatness of spacetime. 

The same properties hold for vectors 02 and S 2 , which give the 
lattice of black dots. As required, the two lattices agree on their 
45-degree diagonals. Now within the 10 x 10 portion of the white 
lattice shown with gray shading, we have an area of 100. In the 
same region we count about 100 or 101 black dots — there is some 
ambiguity because of the dots that lie on the boundary. The density 
of white and black dots is in fact exactly equal, as can be verified 
to any desired precision by making the region big enough. In other 
words, the diagram is drawn so that area is preserved, which is what 
we are going to show is required. 

If it was observer 2 rather than 1 who was drawing the diagram, 
presumably she would choose to draw the black dots in a square 
lattice and vectors 02 and S 2 at right angles. This would require 
vectors 01 and si to be opened up at an oblique angle and the white 
lattice to be non-square. 

Now suppose we had not made area conserved. What if a region 
containing 100 white dots had held 200 black ones? Dot-counting is 
how the observers define area, so if this happened, they would have 
to agree that a boost by v, from frame 1 to frame 2, doubled the area 
of the gray region. Because spacetime is flat (P2) and homogeneous 
(P3), it is possible to take a geometrical shape inscribed in a certain 
region of spacetime and move, rotate, or flip it. And by isotropy 
of space (P3), a boost of velocity v is the same as a flip of the 
spatial dimension followed by a —v boost and another flip. Area is 
conserved by a flip, so we find that a boost by —v, from frame 2 
to frame 1, also doubles area. Thus a +u boost followed by a — v 
boost would cause a quadrupling of area. But a pair of equal and 
opposite boosts cancels out, so this is a contradiction. We conclude 
that if these symmetry principles hold, then spacetime area is the 
same for any two observers, so it is an invariant. 

It may seem unnecessarily clumsy that we’ve used the idea of 
counting dots in the above argument, but remember that our main 
use of this result is to derive the form of the metric, and before 
the metric had been found, we had no system of measurement for 
relativity, so we had only very primitive techniques at our disposal. 
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Problems 


1 Section 2.5 gives an argument that spacetime area is a rela- 
tivistic invariant. Is this argument also valid for Galilean relativity? 

2 Section 2.5 gives an argument that spacetime area is a rela- 
tivistic invariant, (a) Generalize this from 1+1 dimensions to 3+1. 
(b) Use this result to prove that there is no relativistic length con- 
traction effect along an axis perpendicular to the velocity. 

3 The purpose of this problem is to find how the direction of a 
physical object such as a stick changes under a Lorentz transforma- 
tion. Part b of problem 2 shows that relativistic length contraction 
occurs only along the axis parallel to the motion. The generalization 
of the 1 + 1-dimensional Lorentz transformation to 2 + 1 dimensions 
therefore consists simply of augmenting equation (1) on p. 31 with 
y' = y. Suppose that a stick, in its own rest frame, has one end 
with a world-line (r, 0 , 0 ) and the other with (r,p, q ), where r is the 
stick’s proper time. Call these ends A and B. In other words, we 
have a stick that goes from the origin to coordinates (p. q ) in the 
( x , y) plane. Apply a Lorentz transformation for a boost with ve- 
locity v in the x direction, and find the equations of the world-lines 
of the ends of the stick in the new (t r , x', y') coordinates. According 
to this new frame’s notion of simultaneity, find the coordinates of 
B when A is at (t',x',y') = (0,0,0). (a) In the special case where 
q = 0, recover the 1 + 1 -dimensional result for length contraction 
given on p. 27. (b) Returning to the general case where q / 0, 
consider the angle 6 that the stick makes with the x axis, and the 
related angle 6' that it makes with the x' axis in the new frame. 
Show that tan O' = 7 tan 8. 

4 Section 2.2 discusses the idea that a two-dimensional bug 
living on the surface of a sphere could tell that its space was curved. 
Figure c on p. 45 shows one way of telling, by detecting the path- 
dependence of parallel transport. A different technique would be to 
look for violations of the Pythagorean theorem. In the figure below, 
1 is a diagram illustrating the proof of the Pythagorean theorem in 
Euclid’s Elements (proposition 1.47). This diagram is equally valid 
if the page is rolled onto a cylinder, 2 , or formed into a wavy cor- 
rugated shape, 3. These types of curvature, which can be achieved 
without tearing or crumpling the surface, are not real to the bug. 
They are simply side-effects of visualizing its two-dimensional uni- 
verse as if it were embedded in a hypothetical third dimension — 
which doesn’t exist in any sense that is empirically verifiable to the 
bug. Of the curved surfaces in the figure, only the sphere, 4, has 
curvature that the bug can measure; the diagram can’t be plastered 
onto the sphere without folding or cutting and pasting. If a two- 
dimensional being lived on the surface of a cone, would it say that 
its space was curved, or not? What about a saddle shape? 
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Problem 4. 

5 The discrepancy in parallel transport shown in figure c on 
p. 45 can also be interpreted as a measure of the triangle’s angular 
defect d, meaning the amount S — n by which the sum of its interior 
angles S exceeds the Euclidean value, (a) The figure suggests a 
simple way of verifying that the angular defect of a triangle inscribed 
on a sphere depends on area. It shows a large equilateral triangle 
that has been dissected into four smaller triangles, each of which 
is also approximately equilateral. Prove that D = Ad. where D is 
the angular defect of the large triangle and d the value for one of 
the four smaller ones, (b) Given that the proportionality to area 
d = kA holds in general, find some triangle on a sphere of radius R 
whose area and angular defect are easy to calculate, and use it to 
fix the constant of proportionality k. 

Remark: A being who lived on a sphere could measure d and A for some triangle 
and infer R, which is a measure of curvature. The proportionality of the effect 
to the area of the triangle also implies that the effects of curvature become 
negligible on sufficiently small scales. The analogy in relativity is that special 
relativity is a valid approximation to general relativity in regions of space that 
are small enough so that spacetime curvature becomes negligible. 


Problem 5. 
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Kinematics 


At this stage, many students raise the following questions, which 
turn out to be related to one another: 

1. According to Einstein, if observers A and B aren’t at rest 
relative to each other, then A says B’s time is slow, but B 
says A is the slow one. How can this be? If A says B is slow, 
shouldn’t B say A is fast? After all, if I took a pill that sped up 
my brain, everyone else would seem slow to me, and I would 
seem fast to them. 

2. Suppose I keep accelerating my spaceship steadily. What hap- 
pens when I get to the speed of light? 

3. In all the diagrams in section 1.4, the parallelograms have their 
diagonals stretched and squished by a certain factor, which 
depends on v. What is the interpretation of this factor? 
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3.1 How can they both . . . ? 

Figure a shows how relativity resolves the first question. If A and B 
had an instantaneous method of communication such as Star Trek’s 
subspace radio, then they could indeed resolve the question of who 
was really slow. 


a /Signals don’t resolve the dis- 
pute over who is really slow. 




But relativity does not allow cause and effect to be propagated 
outside the light cone, so the best they can actually do is to send 
each other signals at c. In a/1, B sends signals to A at time intervals 
of one hour as measured by B’s clock. According to A’s clock, the 
signals arrive at an interval that is shorter than one hour as the two 
spaceships approach one another, then longer than an hour after 
they pass each other and begin to recede. As shown in a/2, the 
situation is entirely symmetric if A sends signals to B. 

Who is really slow? Neither. If A, like many astronauts, cut her 
teeth as a jet pilot, it may occur to her to interpret the observations 
by analogy with the Doppler effect for sound waves. Figure a is 
in fact a valid diagram if the signals are clicks of sound, provided 
that we interpret it as being drawn in the frame of reference of the 
air. Sound waves travel at a fixed speed relative to the air, and 
the space and time units could be chosen such that the speed of 
sound was represented by a slope of ±1. But A will find that in 
the relativistic case, with signals traveling at c, her observations 
of the time intervals are not in quantitative agreement with the 
predictions she gets by plugging numbers into the familiar formulas 
for the Doppler shift of sound waves. She may then say, “Ah, the 
analogy with sound isn’t quite right. I need to include a correction 
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factor for time dilation, since B’s time is slow. I’m not slow, of 
course. I feel perfectly normal.” 

But her analogy is false and needlessly complicates the situation. 
In the version with sound waves and Galilean relativity, there are 
three frames of reference involved: A’s, B’s, and the air’s. The rela- 
tivistic version is simpler, because there are only two frames, A’s and 
B’s. It’s neither helpful nor necessary to break down the observa- 
tions into a factor describing what “really” happens and a correction 
factor to account for the relativistic distortions of “reality.” All we 
need to worry about is the world-lines and intersections of world- 
lines shown in the spacetime diagrams, along with the metric, which 
allows us to compute how much proper time is experienced by each 
observer. 




b/The twin paradox with signals 
sent back to earth by the traveling 
twin. 


3.2 The stretch factor is the Doppler shift 

Figure b shows how the ideas in the preceding section apply to the 
twin paradox. In b/1 we see the situation as described by an impar- 
tial observer, who says that both twins are traveling to the right. 
But even the impartial observer agrees that one twin’s motion is 
inertial and the other’s noninertial, which breaks the symmetry and 
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c / Interpretation of the iden- 
tity D(v)D(—v) = 1. 


also allows the twins to meet up at the end and compare clocks. 
For convenience, b/2 shows the situation in the frame where the 
earthbound twin is at rest. Both panels of the figure are drawn such 
that the relative velocity of the twins is 3/5, and in panel 2 this 
is the inverse slope of the traveling twin’s world-lines. Straightfor- 
ward algebra and geometry (problem 6, p. 76) shows that in this 
particular example, the period observed by the earthbound twin is 
increased by a factor of 2. But 2 is exactly the factor by which the 
diagonals of the parallelogram are stretched and compressed in a 
Lorentz transformation for a velocity of 3/5. This is true in general: 
the stretching and squishing factors for the diagonals are the same 
as the Doppler shift. We notate this factor as D (which can stand 
for either “Doppler” or “diagonal”), and in general it is given by 


D(v) = 


1 + v 
1 — V 


(problem 7, p. 76). 


self-check A 

If you measure with a ruler on figure b/2, you will find that the labeled 
sides of the quadrilateral differ by less than a factor of 2. Why is this? 
> Answer, p. ?? 


This expression is for the longitudinal Doppler shift, i.e., the case 
where the source and observer are in motion directly away from one 
another (or toward one another if v < 0). In the purely transverse 
case, there is a Doppler shift 1 /y which can be interpreted as simply 
a measure of time dilation. 


The useful identity D(v)D(—v) = 1 is trivial to prove alge- 
braically, and has the following interpretation. Suppose, as in figure 
c, that A and C are at rest relative to one another, but B is moving 
relative to them. B’s velocity relative to A is v , and C’s relative 
to B is —v. At regular intervals, A sends lightspeed “pings” to B, 
who then immediately retransmits them to C. The interval between 
pings accumulates two Doppler shifts, and the result is their prod- 
uct D(v)D(—v). But B didn’t actually need to receive the original 
signal and retransmit it; the results would have been the same if B 
had just stayed out of the way. Therefore this product must equal 
1, so D(y)D(—v) = 1. 

Ives-Stilwell experiments Example 1 

The transverse Doppler shift is a characteristic prediction of spe- 
cial relativity, with no nonrelativistic counterpart, and Einstein sug- 
gested it early on as a test of relativity. However, it is difficult to 
measure with high precision, because the results are sensitive 
to any error in the alignment of the 90-degree angle. Such ex- 
periments were eventually performed, with results that confirmed 
relativity, 1 but one-dimensional measurements provided both the 

1 See, e.g., Hasselkamp, Mondry, and Scharmann, Zeitschrift fur Pliysik A: 
Hadrons and Nuclei 289 (1979) 151. 
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earliest tests of the relativistic Doppler shift and the most pre- 
cise ones to date. The first such test was done by Ives and Stil- 
well in 1938, using the following trick. The relativistic expression 
D(v) = i/( 1 + y)/(1 - v ) for the Doppler shift has the property 
that D(v)D(-v) = 1, which differs from the nonrelativistic result 
of (1 + v)(1 - v) = ~\ - v 2 . One can therefore accelerate an 
ion up to a relativistic speed, measure both the forward Doppler 
shifted frequency f f and the backward one f b , and compute \ffff~ b . 
According to relativity, this should exactly equal the frequency f 0 
measured in the ion’s rest frame. 

In a particularly exquisite modern version of the Ives-Stilwell idea, 2 
Saathoff et al. circulated Li + ions at v = .064 in a storage ring. 
An electron-cooler technique was used in order to reduce the 
variation in velocity among ions in the beam. Since the identity 
D(v)D(-v) = 1 is independent of v, it was not necessary to mea- 
sure v to the same incredible precision as the frequencies; it was 
only necessary that it be stable and well-defined. The natural line 
width was 7 MHz, and other experimental effects broadened it fur- 
ther to 1 1 MHz. By curve-fitting the line, it was possible to achieve 
results good to a few tenths of a MHz. The resulting frequencies, 
in units of MHz, were: 

f f = 582490203.44 ± .09 
f b = 512671442.9 ±0.5 
x ff t T b = 546466918.6 ±0.3 

f 0 = 546466918.8 ± 0.4 (from previous experimental work) 

The spectacular agreement with theory has made this experiment 
a lightning rod for anti-relativity kooks. 

If one is searching for small deviations from the predictions of 
special relativity, a natural place to look is at high velocities. Ives- 
Stilwell experiments have been performed at velocities as high as 
0.84, and they confirm special relativity. 3 

3.3 Combination of velocities 

In nonrelativistic physics, velocities add in relative motion. For 
example, if a boat moves relative to a river, and the river moves 
relative to the land, then the boat’s velocity relative to the land 
is found by vector addition. This linear behavior cannot hold rel- 
ativistically. For example, if a spaceship is moving relative to the 
earth at velocity 3/5 (in units with c = 1), and it launches a probe 
at velocity 3/5 relative to itself, we can’t have the probe moving at 
a velocity of 6/5 relative to the earth, because this would be greater 

2 G. Saathoff et al., “Improved Test of Time Dilation in Relativity,” Phys. 
Rev. Lett. 91 (2003) 190403. A publicly available description of the experiment 
is given in Saathoff’s PhD thesis, www.mpi-hd.mpg.de/ato/homes/saathoff/ 
diss-saathof f .pdf. 

3 MacArthur et al., Phys. Rev. Lett. 56 (1986) 282 (1986) 
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d / Two Lorentz transforma- 
tions of v = 3/5 are applied one 
after the other. The transforma- 
tions are represented according 
to the graphical conventions of 
section 1 .4. 



a N t N (m/s x 10 s ) 


e / Example 2. 


than the maximum speed of cause and effect, which is 1. To see how 
to add velocities relativistically, we consider the effect of carrying 
the two Lorentz transformations one after the other, figure d. 

The inverse slope of the left side of each parallelogram indicates 
its velocity relative to the original frame, represented by the square. 
Since the left side of the final parallelogram has not swept past the 
diagonal, clearly it represents a velocity of less than 1, not more. To 
determine the result, we use the fact that the D factors multiply. We 
chose velocities 3/5 because it gives D = 2, which is easy to work 
with. Doubling the long diagonal twice gives an over-all stretch 
factor of 4, and solving the equation D(v) = 4 for v gives the result, 
v = 15/17. 

We can now see the answer to question 2 on p. 53. If we keep 
accelerating a spaceship steadily, we are simply continuing the pro- 
cess of acceleration shown in figure d. If we do this indefinitely, the 
velocity will approach c = 1 but never surpass it. (For more on this 
topic of going faster than light, see section 4.7.) 

Accelerating electrons Example 2 

Figure e shows the results of a 1964 experiment by Bertozzi in 
which electrons were accelerated by the static electric field E of 
a Van de Graaff accelerator of length i\. They were then allowed 
to fly down a beamline of length i 2 = 8.4 m without being acted 
on by any force. The time of flight f 2 was used to find the final 
velocity v = i 2 /t 2 to which they had been accelerated. (To make 
the low-energy portion of the graph legible, Bertozzi’s highest- 
energy data point is omitted.) 

If we believed in Newton’s laws, then the electrons would have 
an acceleration a N = Ee/m, which would be constant if, as we 
pretend for the moment, the field E were constant. (The electric 
field inside a Van de Graaff accelerator is not really quite uniform, 
but this will turn out not to matter.) The Newtonian prediction for 
the time over which this acceleration occurs is t N = ^/2mC-\/eE. 
An acceleration aN acting for a time f/v should produce a final ve- 
locity a/vf/v = ^j2eV /m, where V = Et\ is the voltage difference. 
(By conservation of energy, this equation holds even if the field 
is not constant.) The solid line in the graph shows the prediction 
of Newton’s laws, which is that a constant force exerted steadily 
over time will produce a velocity that rises linearly and without 
limit. 

The experimental data, shown as black dots, clearly tell a differ- 
ent story. The velocity asymptotically approaches a limit, which 
we identify as c. The dashed line shows the predictions of spe- 
cial relativity, which we are not yet ready to calculate because we 
haven’t yet seen how kinetic energy depends on velocity at rel- 
ativistic speeds. The calculation is carried out in example 4 on 
p. 85. 
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Note that the relationship between the first and second frames of 
reference in figure d is the same as the relationship between the sec- 
ond and third. Therefore if a passenger is to feel a steady sensation 
of acceleration (or, equivalently, if an accelerometer aboard the ship 
is to show a constant reading) , then the proper time required to pass 
from the first frame to the second must be the same as the proper 
time to go from the second to the third. A nice way to express this is 
to define the rapidity r/ = In D. Combining velocities means multi- 
plying D' s, which is the same as adding their logarithms. Therefore 
we can write the relativistic rule for combining velocities simply as 

ih = m+ m- 

The passengers perceive the acceleration as steady if rj increase by 
the same amount per unit of proper time. In other words, we can 
define a proper acceleration dij/dr , which corresponds to what an 
accelerometer measures. 

Rapidity is convenient and useful, and is very frequently used 
in particle physics. But in terms of ordinary velocities, the rule for 
combining velocities can also be rewritten using identity [9] from 
section 3.6 as 

V\ + V2 

v c = . 

1 + V\V2 


self-check B 

How can we tell that this equation is written in natural units? Rewrite it 
in SI units. > Answer, p. ?? 

3.4 No frame of reference moving at c 

We have seen in section 3.3 that no continuous process of accel- 
eration can boost a material object to c. That is, the subluminal 
(slower than light) nature of a electron or a person is a fundamental 
feature of its identity and can never be changed. Einstein can never 
get on his motorcycle and drive at c as he imagined when he was a 
young man, so we material beings can never see the world from a 
frame of reference that travels at c. 

Our universe does, however, contain ingredients such as light 
rays, gluons, and gravitational waves that travel at c, so we might 
wonder whether these things could be put together to form observers 
who do move at c. But this is not possible according to special rel- 
ativity, because if we let v approach infinity, extrapolation of figure 
d on p. 58 shows that the Lorentz transformation would compress 
all of spacetime onto the light cone, reducing its number of dimen- 
sions by 1. Distinct points would be merged, which would make it 
impossible to use this frame to describe the same phenomena that 
a subluminal observer could describe. That is, the transformation 
would not be one-to-one, and this is unacceptable physically. 
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f / A playing card returns to 
its original state when rotated 
by 180 degrees. Its orientation, 
unlike the orientation of an arrow, 
doesn’t behave as a vector, since 
it doesn’t transform in the usual 
way under rotations. Under a 
180-degree rotation, a vector 
should negate itself rather than 
coming back to its original state. 


3.5 The velocity and acceleration vectors 

3.5.1 The velocity vector 

In a freshman course in Newtonian mechanics, we would define 
a vector as something that has three components. Furthermore, we 
would require it to transform in a certain way under a rotation. 
For example, we could form the collection of numbers (e,T, DJIA), 
where e is the fundamental charge, T is the temperature in Buffalo, 
New York, and DJIA measures how the stock market is doing. But 
this would not be a vector, since it doesn’t act the right way when 
rotated (this particular “vector” is invariant under rotations). Fig- 
ure f gives a less silly non-example. In contradistinction to a vector, 
a scalar is specified by a single real number and is invariant under 
rotations. 

The most basic example of a Newtonian vector was a displace- 
ment (Ax, Ay, Az), and from the displacement vector we would 
go on to construct other quantities such as a velocity vector v = 
Ar/A t. This worked because in Newtonian mechanics At was treated 
as a scalar, and dividing a vector by a scalar produces something 
that again transforms in the right way to be a vector. 

Now let’s upgrade to relativity, and work through the same steps 
by analogy. When I say “vector” in this book, I mean something that 
in 3+1 dimensions has four components. This can also be referred 
to as a four- vector. Our only example so far has been the spacetime 
displacement vector Ar = (At, Ax, Ay, A z). This vector transforms 
according to the Lorentz transformation. In general, we require as 
part of the definition of a (four-)vector that it transform in the usual 
way under both rotations and boosts (Lorentz transformations). We 
might now imagine that the next step should be to construct a 
velocity four- vector Ar/A t. But relativistically, the quantity Ar/A t 
would not transform like a vector, e.g., if r was spacelike, then there 
would be a frame in which we had At = 0, and then Ay/ A t would 
be finite in some frames but infinite in others, which is absurd. 

To construct a valid vector, we have to divide Ar by a scalar. 
The only scalar that could be relevant would be the proper time Ar, 
and this is indeed how the velocity vector is defined in relativity. 
For an inertial world-line (one with constant velocity), we define 
v = Ar/ Ar. The generalization to noninertial world-lines requires 
that we make this definition into a derivative: 

dr 

V = cD 

Not all objects have well-defined velocity vectors. For exam- 
ple, consider a ray of light with a straight world-line, so that the 
derivative d. . . / d. . . is the same as the ratio of finite differences 
A . . . /A . . ., i.e. , calculus isn’t needed. A ray of light has v = c, 
so that applying the metric to any segment of its world-line gives 


60 


Chapter 3 Kinematics 


At = 0. Attempting to calculate v = Ay/ At then gives something 
with infinite components. We will see in section 4.3.4 that all mass- 
less particles, not just photons, travel at c, so the same would apply 
to them. Therefore a velocity vector is only defined for particles 
whose world-lines are timelike, i.e., massive particles. 

Velocity vector of an object at rest Example 3 

An object at rest has v = (1 , 0). The first component indicates that 
if we attach a clock to the object with duct tape, the proper time 
measured by the clock suffers no time dilation according to an 
observer in this frame, dl/dx = 1. The second component tells 
us that the object’s position isn’t changing, dx/dx = 0. 

3.5.2 The acceleration vector 

The acceleration vector is defined as the derivative of the velocity 
vector with respect to proper time, 

dv 

a_ dr‘ 

It measures the curvature of a world-line. Its squared magnitude is 
the minus the square of the proper acceleration, meaning the accel- 
eration that would be measured by an accelerometer carried along 
that world-line. The proper acceleration is only approximately equal 
to the magnitude of the Newtonian acceleration three- vector, in the 
limit of small velocities. 

Constant proper acceleration Example 4 

> Suppose a spaceship moves so that the acceleration is judged 
to be the constant value a by an observer on board. Find the 
motion x(t) as measured by an observer in an inertial frame. 

> Let x stand for the ship’s proper time, and let dots indicate 
derivatives with respect to x. The ship’s velocity has magnitude 
1 , so 

i 2 -x 2 = ^. 

An observer who is instantaneously at rest with respect to the 
ship judges is to have an acceleration vector (0, a) (because the 
low-velocity limit applies). The observer in the (t,x) frame agrees 
on the magnitude of this vector, so 

t 2 - x 2 = -a 2 . 

The solution of these differential equations is t = ^sinhax, 
x = \ cosh ax (choosing constants of integration so that the ex- 
pressions take on their simplest forms). Eliminating x gives 

x = -\/l + a 2 f 2 , 
a 

shown in figure g. The world-line is a hyperbola, and this type of 
motion is sometimes referred to as hyperbolic motion. 



g / A spaceship (curved world- 
line) moves with an acceleration 
perceived as constant by its 
passengers. 
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As t approaches infinity, dx/d t approaches the speed of light. 

In the same limit, x increases exponentially with proper time, so 
that surprisingly large distances can in theory be traveled within 
a human lifetime (problem 7, p. 112). Some further properties of 
hyperbolic motion are developed in problems 10, 11, and 1 2. 

Another interesting feature of this problem is the dashed-line asymp- 
tote, which is lightlike. Suppose we interpret this as the world-line 
of a ray of light. The ray comes closer and closer to the ship, 
but will never quite catch up. Thus provided that the rocket never 
stops accelerating, the entire region of spacetime to the left of 
the dashed line is forever hidden from its passengers. That is, 
an observer who undergoes constant acceleration has an event 
horizon — a boundary that prevents her from observing anything 
on the other side. You may have heard about the event horizon 
associated with a black hole. This example shows that we can 
have event horizons even when there is no gravity at all. 

3.5.3 Constraints on the velocity and acceleration vectors 

Counting degrees of freedom 

There is something misleading about the foregoing treatment of 
the velocity and acceleration vectors, and the easiest way to see this 
is by introducing the idea of a degree of freedom. Often we can 
describe a system using a list of real numbers. For the hand on a 
clock, we only need one number, such as 3 o’clock. This is because 
the hand is constrained to stay in the plane of the clock’s face and 
also to keep its tail at the center of the circle. Since one number 
describes its position, we say that it has one degree of freedom. If 
a hiker wants to know where she is on a map, she has two degrees 
of freedom, which could be specified as her latitude and longitude. 

If she was in a helicopter, there would be no constraint to stay on 
the earth’s surface, and the number of degrees of freedom would be 
increased to three. If we also considered the helicopter’s velocity to 
be part of the description of its state, then there would be a total 
of six degrees of freedom: one for each coordinate and one for each 
component of the velocity vector. 

Now suppose that we want to specify a particle’s velocity and ac- 
celeration. In Newtonian mechanics, we would describe these three- 
vectors as possessing a total of six degrees of freedom: v x , v y , v z , 
a x , a y , and a z . Upgrading from Newtonian mechanics to relativ- 
ity can’t change the number of degrees of freedom. For example, an 
electron’s acceleration is fully determined by the force we exert on it, 
and we might control that acceleration by placing a proton nearby 
and producing an electrical attraction. The position of the proton 
(three degrees of freedom for its three coordinates) determines the 
electron’s acceleration, so the acceleration has exactly three degrees 
of freedom as well. 
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This means that there must be some hidden redundancy in the 
eight components of the velocity and acceleration four-vectors. The 
system only has six degrees of freedom, so there must be two con- 
straints that we didn’t know about. Similarly, I’ve gone hiking and 
had my GPS unit claim that I was a thousand feet above a lake or 
three thousand feet under a mountain. In those situations there was 
a constraint that I knew about but that the GPS didn’t: that I was 
on the surface of the earth. 


Normalization of the velocity 

The first constraint arises naturally from a geometrical inter- 
pretation of the velocity four-vector, shown in figure h. The curve 
represents the world-line of a particle. The dashed line is drawn 
tangent to the world-line at a certain moment. Under a microscope, 
the dashed line, which represents a possible inertial motion of a par- 
ticle, is indistinguishable from the solid curve, which is noninertial. 
The dashed line has a slope At/ Ax = 2, which corresponds to a 
velocity Ax/ At = 1/2. The figure is drawn in 1 + 1 dimensions, but 
in 3 + 1 dimensions we would want to know more than this num- 
ber. We would want to know the orientation of the dashed line in 
the three spatial dimensions, i.e., not just the speed of the particle 
but also its direction of motion. All the desired information can be 
encapsulated in a vector. Both of the vectors shown in the figure 
are parallel to the dashed line, so even though they have different 
lengths, there is no difference between the velocities they represent. 
Since we want the particle to have a single well-defined vector to 
represent its velocity, we want to pick one vector from among all 
the vectors parallel to the dashed line, and call that “the” velocity 
vector. 

We have already implicitly made this choice. It follows from 
the original definition v = dr/d r that the velocity vector’s squared 
magnitude v 2 = v • v is always equal to 1, even though the ob- 
ject whose motion it describes is not moving at the speed of light. 
This, along with the requirement that the velocity vector lie within 
the future rather than the past light cone, uniquely specifies which 
tangent vector we want. The requirement v 2 = 1 is an example of 
a recurring idea in physics and mathematics called normalization. 
The idea is that we have some object (a vector, a function, . . . ) 
that could be scaled up or down by any amount, but from among 
all the possible scales, there is only one that is the right one. For 
example, a gambler might place a horse’s chance of winning at 9 to 
1, but a physicist would divide these by 10 in order to normalize the 
probabilities to 0.9 and 0.1, the idea being that the total probability 
should add up to 1. Our definition of the velocity vector implies that 
it is normalized. Thus an alternative, geometrical definition of the 
velocity vector would have been that it is the vector that is tangent 
to the particle’s world- line, future-directed, and normalized to 1. 



h / Both vectors are tangent 
vectors. 
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When we hear something referred to as a “vector,” we usually 
take this is a statement that it not only transforms as a vector, but 
also that it adds as a vector. But the sum of two velocity vectors 
would not typically be a valid velocity vector at all, since it would 
not have unit magnitude. This lack of additivity would in any case 
have been expected because velocities don’t add linearly in relativity 
(section 3.3). 

self-check C 

Velocity vectors are required to have v 2 = 1. If a vector qualifies as a 

valid velocity vector in some frame, could it be invalid in another frame? 

> Answer, p. ?? 

A nice way of thinking about velocity vectors is that every such 
vector represents a potential observer. That is, the velocity vectors 
are the observer- vectors o of chapter 1 , but with a normalization 
requirement o 2 = 1 that we did not impose earlier. An observer 
writes her own velocity vector as ( 1 , 0 ), i.e., as the unit vector in the 
timelike direction. Since we have no notion of adding one observer 
to another observer, it makes sense that velocity vectors don’t add 
relativistically. Similarly, there is no meaningful way to define the 
magnitude of an observer, so it makes sense that the magnitude of a 
velocity vector carries no useful information and can arbitrarily be 
set equal to 1 . 

Regarding the magnitude, note also that the magnitude of a 
vector is frame-invariant, and therefore it wouldn’t make sense to 
imagine that the magnitude of an object’s four-velocity would pro- 
duce some number telling you how fast the object was going. How 
fast relative to what? 

If u and v are both future-directed, properly normalized velocity 

vectors, and if the signature is -I as in this book, then their 

inner product is 7 = u • v, the gamma factor, introduced in section 
1.3.3, p. 25, corresponding their relative velocity. 

Orthogonality of the velocity and acceleration 

Now for the second of the two constraints deduced on p. 62. 

Suppose an observer claims that at a certain moment in time, 
a particle has v = (1,0) and a = (3,0). That is, the particle is 
at rest (v x = 0) and its vt is growing by 3 units per second. This 
is impossible, because after an infinitesimal time interval d t, this 
rate of change will result in v = (1 + 3dt, 0), which is not properly 
normalized: its magnitude has grown from 1 to 1+3 dt. The observer 
is mistaken. This is not a possible combination of velocity and 
acceleration vectors. In general (problem 9, p. 76), we always have 
the following constraint on the velocity and acceleration vectors: 

a • v = 0. 

This is analogous to the three-dimensional idea that in uniform cir- 
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cular motion, the perpendicularity of the velocity and acceleration 
three-vectors is what causes the velocity vector to rotate without 
changing its magnitude. 

3.6 * Some kinematic identities 

In addition to the relations 


D(v) 

V c 


1 + v 
1 — V 


Vl + V2 
1 + V\V2 ’ 


and 


the following identities can be handy. If stranded on a desert island 
you should be able to rederive them from scratch. Don’t memorize 
them. 


[ 1 ] 

[ 2 ] 

[3] 

[4] 


v = (D 2 - 1)/ (D 2 + 1) 
7 = ( D - 1 + D )/2 
u 7 = (D - Z) -1 )/2 
D(v)D(—v) = 1 


[5] 77 = In D [10] 

[6] v = tanh r/ [11] 

[7] 7 = cosh r) [12] 

[8] v'y = sinh?7 

[9] + 


D c — D 1 D 2 
Vc = Vi + V2 
Vclc = (vi + U 2 )7l72 


The hyperbolic trig functions are defined as follows: 


si nil x = 


e — e 


2 


cosh x 
tanh x 



2 

sinh x 


cosh x 


Their inverses are built in to some calculators and computer soft- 
ware, but they can also be calculated using the following relations: 


sinh 1 
cosh -1 
tanh -1 


x = In x 2 + l j 

X = 111 ^ X + \/ x 2 — l^j 


x = -!n 


1 + X 
1 — X 


Their derivatives are, respectively, (x 2 + 1) - 1 / 2 , (x 2 — 1) 1 / 2 , and 
(1 — x 2 ) -1 . 
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3.7 ★ The projection operator 

A frequent source of confusion in relativity is that we write down 
equations that are coordinate-dependent, but forget the dependency. 
Similarly, it is possible to write expressions that are only valid for 
one choice of signature. The following notation, defining a projection 
operator P, is one tool for avoiding these difficulties. 

r o 

P Q r = r o (1) 

o • o 

Usually o is the future timelike vector representing a certain ob- 
server, but the definition can be applied as long as o isn’t lightlike. 
The idea being expressed is that we want to get rid of any part 
of r that is parallel to o’s arrow of time. In a graph constructed 
according to o’s Minkowski coordinates, we cast r’s shadow down 
perpendicularly onto the spacelike axis, or the spacelike three-plane 
in 3 + 1 dimensions. This is why P is referred to as a projection op- 
erator. The notation sometimes allows us to express the things that 
we would otherwise express by explicitly or implicitly constructing 
and referring to o’s spacelike Minkowski coordinates. P has the 
following properties: 

1. o • P G r = 0 

2. r — P Q r is parallel to o. 

3. P 0 o = 0 

4. P 0 P 0 r = P Q r 

5. P co = P 0 

6. P 0 is linear, i.e., P 0 (q + r) = P 0 q + P 0 r and P 0 (cr) = cP 0 r 

7. ^P 0 r = Pojy, where x is any variable and o doesn’t depend 
on x. 

8. If o and v are both future timelike, and |o 2 | = 1, then we can 
express v as v = P Q v + 70, where 7 has the usual interpreta- 
tion for world-lines that coincide with these two vectors. 

All of these hold regardless of whether the signature is H 

or b ++, and none of them refer to any coordinates. Properties 

1 and 2 can serve as an alternative, geometrical definition of P. 
Property 3 says that an observer considers herself to be at rest. 4 
is a general property of all projection operators. 8 splits the vector 
into its spatial and temporal parts according to o. 

Sometimes if we know a position, velocity, or acceleration four- 
vector, we want to find out how these would be measured by a par- 
ticular observer using clocks and rulers. The following table shows 
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how to switch back and forth between the two representations. We 
use, for example, the notation v Q to mean the velocity vector of the 
form ( 0 , v x , v y , v z ) that would be measured by an observer whose 
velocity vector is o (so that the subscript is an “o” for “observer,” 
not a zero). Since this type of vector, expressed in the Minkowski 
coordinates of observer o, has a zero time component, we refer to it 
as a three-vector. In all of these expressions, the velocity vectors o 
and v are assumed to be normalized, and the signature is assumed 
to be H (one implication being that o • v is simply 7 ). 


finding the three-vector from the 

finding the four-vector from the 
three-vector 

four-vector 

x 0 = P 0 x 

V - £>v 

0 ov 

a ° = (cTvp [ p ° a (° ' a ) V °] 

V = 7(0 + V 0 ) 

a = 7 3 (a Q • v Q )v + 7 2 a Q , where 

v is found as above 


As an example of how these are derived, the three-velocity v G 
is the derivative of x Q with respect to observer o’s Minkowski time 
coordinate f, whereas the four- velocity is defined as the derivative of 
x with respect to the proper time r of the world-line being observed. 
Therefore we have 


dx Q 


d-PpX 

df 


and applying property 7 of the projection operator this becomes 


dx 

v « = F “ d* 

dx dr 

0 dr df 
7 ° dr 


1 


o • v 
TqV 

O • V 


°dr 


The similar but messier derivation of the expression for a Q is problem 
15. In manipulating expressions of this type, the identity dy/ df = 
7 3 a G • v Q is often handy (problem 14). 

Lewis-Tolman paradox Example 5 

The following example is a form of a paradox discussed by Lewis 
and Tollman in 1909. Figure i shows the frame of reference of 
observer o in which identical particles 1 and 2 are at initially rest 
and located at equal distances l from the origin along the y and x 
axes. External forces of equal strength act in the directions shown 
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1 <> ► 

acceleration a 



i / Example 5. 


by the arrows so as to produce accelerations of magnitude a. The 
system is in rotational equilibrium dZ_/ df = 0, because the rate 
at which particle 1 picks up clockwise angular momentum is the 
same as the rate at which 2 acquires it in the counterclockwise 
direction. 

Now change to the frame of reference o', moving to the right 
relative to o at velocity v. Particle 2’s distance from the origin 
is Lorentz-contracted from £ to t/y, so its angular momentum is 
also reduced by 1 /y. It now appears that the system’s total an- 
gular momentum is increasing in the clockwise sense. How can 
we have rotational equilibrium in one frame, but not another? 

The resolution of the paradox is that the accelerations transform 
as well. In the original frame o, the four-velocities are = v 2 = 
(1 , 0, 0, 0), and the four-accelerations are ai = (0, a, 0, 0) and a 2 = 
(0, 0, a, 0). Applying a Lorentz transformation, we have v', = v' 2 = 
(y, —yv, 0, 0) and 


a\ = oc(-yv,y, 0,0) 
a' 2 = <x(0, 0,1,0). 

Our definition of angular momentum is expressed in terms of 
ffrree-vectors such as a 0 'i and a 0 '2, not four-vectors like a\ and 
a' 2 . We have 

d U . t 

— = ma 0 i\ x t - ma 0 '2y-- 

Using the relations v 0 = y _1 P 0 v and a 0 = y ~ 2 [P 0 a - (o • a)v 0 ], 
we find 


Vix = -v, 

a&u = -olocy ~ (~ocyv)(-v)] = 

yO 


and 


a 

a °' 2y = 1 


The result is 


d L' a a i 

ttt = - m 


d f' 


y° 


Y 2 y 


which is zero. 
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3.8 ★ Faster-than-light frames of reference? 

We recall from section 3.4 that special relativity doesn’t permit the 
existence of observers who move at c. This is because if two ob- 
servers differ in velocity by c, then the Lorentz transformation be- 
tween them is not a one-to-one map, which is physically unaccept- 
able. 

But what about a superluminal observer, one who moves faster 
than c? With charming naivete, the special-effects technicians for 
Star Trek attempted to show the frame of reference of such an ob- 
server in scenes where a field of stars rushed past the Enterprise. 
(Never mind that the stars, which pass in front of and behind the 
spaceship, should actually be a million times larger than it.) Ac- 
tually such an observer would consider her own world-line, which 
we call spacelike, to be timelike, while the world-line of a star such 
as our sun, which we consider timelike, would be spacelike in her 
opinion. Our sun’s world-line might, for example, be orthogonal to 
hers, in which case the sun would not appear to her as an object 
in motion but rather as a line stretching across space, which would 
wink into existence and then wink back out. A typical transforma- 
tion between our frame and the frame of such an observer would be 
the map S defined by (t' , x ') = (x, t ), simply swapping the time and 
space coordinates. The “swap” transformation S is one-to-one, and 
therefore not subject to the objection raised previously to frames 
moving at c. S happens to be a boost by an infinite velocity, but we 
can also obtain boosts for velocities c < v < oo and — oo < v < — c 
by combining S with a (subluminal) Lorentz transformation; given 
a superluminal world-line £, we first transform into a frame in which 
£ is a line of simultaneity, and the we apply S. 

But this was all in 1+1 dimensions. In 3+1 dimensions, what is 
the equivalent of 5? One possibility is something like (t ' , x', y', z 1 ) = 
(x,t,t,t), but this isn’t one-to-one. We can’t squish three dimen- 
sions to one or expand one to three without merging points or split- 
ting one point into many. 

Another possibility would be a one-to-one transformation such 
as ( t x\ y' , z') = ( x , t, y, z). The trouble with this version is that it 
violates the isotropy of spacetime (section 2.3, p. 46). For example, 
consider the vector (1,0, 1,0) in the unprimed coordinates. This 
lies on the light cone, and could point along the world-line of a ray 
of light. After the transformation to the primed coordinates, this 
vector becomes (0, 1, 1, 0), which points along a line of simultaneity. 
The primed observer says that the speed of light in this direction is 
infinite, and yet there are other directions in which it has a finite 
value. This clearly violates isotropy. 

A surprisingly large number of papers, going all the way back to 
the birth of relativity, have been written by people trying to find a 
way to extend the Lorentz transformations to superluminal speeds, 
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j/ One-sided thickenings of 
circle and a line segment. 


and these have all turned out to be failures. In fact, there are no-go 
theorems showing that there can be no such thing as a superluminal 
observer in our 3+ 1-dimensional universe. 4,5 

The nonexistence of FTL frames does not immediately rule out 
the possibility of FTL motion. (After all, we do have motion at 
c, but no frames moving at c.) For more about faster-than- light 
motion in relativity, see section 4.7, p. 107. 

3.9 a Thickening of a curve 

3.9.1 A geometrical interpretation of the acceleration 

We’ve interpreted the acceleration vector as a measure of the 
curvature of a world-line, but to make this more than a tool for 
visualization, we would have to define what we mean by curvature. 
A good way to approach this is shown in figure j/1. Here a circle 
of circumference L has been expanded, like a loaf of rising bread, to 
a circle of greater circumference L* . This increase is only because 
the circle is curved. If we do the same thing with a line segment, 
j/2, there is no increase in length. The increase in the length tells 
us about the curvature. 

Quantitatively, suppose that the thickness of the shaded area is 
Ah. Then the increase in circumference A L = L* — L is given by 

1 A L , 

LA h ~ ’ 

where A; is a measure of curvature, and k = 1/r for a circle. We can 
take this as a definition of the curvature of a curve embedded in a 
two-dimensional Euclidean plane. The curves in figure j/1 both have 
constant curvature, and if we had applied our definition to any short 
segment of them, we would have gotten the same answer. For a curve 
with varying curvature, such as a letter “S,” the curvature can be 
defined as the appropriate limit at any given point, as the length of 
the segment enclosing the point approaches zero. Note that we had 
to pick an orientation for the expansion, i.e. , a direction in which to 
expand. Given this orientation, it makes sense to talk about signed 
values of h and k. If we choose the outward orientation for a circle, 
then its k is positive. 

An interesting point about this definition is that it is extrinsic 
rather than intrinsic, in the sense defined in section 2.2.1, p. 45. 
That is, it depends on how the curve is embedded in the ambi- 
ent two-dimensional space, and it depends on the Euclidean metric 
of that space. Because a curve is a one-dimensional object, there 

4 Gorini, “Linear Kinematical Groups,” Commun. Math. Phys. 21 (1971) 
150. Open access via Project Euclid at projecteuclid. org/DPubS?service= 
UI&version=l . 0&verb=Display&handle=euclid. cmp/ 1103857292. 

5 Andreka et al., “A logic road from special relativity to general relativity,” 
arxiv . org/abs/1005 . 0960, theorem 2.1. 
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is nothing internal to the curve that would allow us to define its 
curvature. Imagine yourself as a tiny bug — so tiny that you are 
pointlike. If the curve represents your universe, then you can explore 
it as much as you like, but you can never detect any internal evi- 
dence of its curvature. This is not the case in two dimensions. For 
example, a bug living on the two-dimensional surface of a sphere 
can detect its curvature by drawing triangles and measuring how 
much the sum of their interior angles differs from 180 degrees. This 
would be an intrinsic measure of curvature. (See problem 4, p. 51.) 

The definition given above is readily extended from Euclidean 
space to 1 + 1 dimensions of spacetime. Figure k shows a one-sided 
thickening of an accelerated world-line. Although the shaded area 
doesn’t look uniformly thick to our Euclidean eyes, it is. For exam- 
ple, each of the dotted lines is orthogonal to the original world-line 
on the left, and they all have the same length Ah as measured 
by an observer who traces that world- line. That is, each of these 
lines could represent a rigid measuring rod carried by that observer, 
drawn along a line that that observer considers to be a line of si- 
multaneity at that time. In analogy to the Euclidean case, we have 

1 At _ 1 
r Ah a 

3.9.2 Bell’s spaceship paradox 

A variation on the situation shown in figure k leads to a paradox 
with philosophical implications proposed by John Bell. Bell went 
around the CERN cafeteria proposing the following thought exper- 
iment to the physicists eating lunch, and he found that nearly all of 
them got it wrong. Let two spaceships accelerate as shown in figure 
1. Each ship is equipped with a yard-arm, and a thread is tied be- 
tween the two arms, 1/1. Unaccelerated observer o uses Minkowski 
coordinates (t,x), as shown in 1/2. The accelerations, as judged by 
o, are equal for the two ships as functions of t. Does the thread 
break, due to Lorentz contraction? 

A crucial difference between figures k and 1/2 is that in the for- 
mer, the thickening of the world-line has been carried out along the 
dotted normals, whereas the latter the second world-line is simply 
a copy of the first that has been shifted to the right, parallel to the 
x axis. 

The popular answer in the CERN cafeteria was that the thread 
would not break, the reasoning being that Lorentz contraction is a 
frame-dependent effect, and no such contraction would be observed 
in the rockets’ frames. 

The error in this reasoning is that the accelerations of of the two 
ships were specified to be equal in frame o, not in the frames of the 
rockets. The Minkowski coordinates ( t',x ') shown in 1/2 correspond 
to the frame of an inertial observer o' who is momentarily moving 



k / One-sided thickening of a 
world-line. 
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along with the trailing rocket after the acceleration has been going 
on for a while. The x' axis is a line of simultaneity for o', and this 
axis intersects the leading ship’s world-line at a point that that o 
considers to be later in time. Therefore o' says that the leading 
ship has reached a higher speed than the trailing one. In o', the two 
ships’ accelerations are unequal. 

We can also see directly from the spacetime diagram that whereas 
length L\ is 4 units as measured by an observer initially at rest rel- 
ative to the thread, L 2 is about 5 units as measured by o', who is at 
rest relative to the trailing end of the thread at a later time. Since 
L 2 is greater than the unstressed length L±, the thread is under 
tension. 

Figure 1/3 is more in the spirit of Bell’s analysis. In frame o, 
the thread has initial, unstressed length L. If the thread had been 
attached only to the leading ship, then it would have trailed behind 
it, unstressed, with Lorentz-contracted length L/ 7 . Since its actual 
length according to o is still L, it has been stretched relative to its 
unstressed length. 

This paradox relates to the difficult philosophical question of 
whether the time dilation and length contractions predicted by rel- 
ativity are “real.” This depends, of course, on what one means by 
“real.” They are frame-dependent, i.e., observers in different frames 
of reference disagree about them. But this doesn’t tell us much 
about their reality, since velocities are frame-dependent in Newto- 
nian mechanics, but nobody worries about whether velocities are 
real. Bell took his colleagues’ wrong answers as evidence that their 
intuitions had been misguided by the standard way of approaching 
this question of the reality of Lorentz contractions. 

This treatment has one wart on it, which is that we judged the 
distance between the two ships in a frame of reference instanta- 
neously comoving with the trailing ship, but this is slightly different 
from the length as determined in the leading ship’s frame. One 
way to remove this wart is to note that the fractional discrepancy 
AL 1 /L 1 is of order v :i , which is of a lower order than the strain in 
the thread, which is of order v 2 . To carry out this type of error es- 
timation rigorously would however be cumbersome. A more elegant 
and rigorous approach is given in section 9.5.5 on p. 206, where we 
use fancier techniques to show that the motion shown in figure k is 
the unique motion that allows every portion of the string to move 
without strain. 

This presentation includes ideas contributed by physiesforums 
users tiny-tini and PeterDonis. 
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3.9.3 Deja vu, jamais vu 

Deja vu all over again 

In example 17 on p. 34, we saw that when an observer is ac- 
celerated, she may consider an event to be simultaneous with her 
more than once. That is, given a smooth, timelike world-line r(r) 
parametrized by proper time, and an event E, which we take to be 
the origin of our coordinate system, there may be more than one 
time at which r is orthogonal to the velocity vector v (figure m/1). 
As remarked earlier, this is just a problem with applying a partic- 
ular arbitrary labeling convention to a certain example — not an 
earthshaking crisis in physics. Nevertheless it is of some intrinsic 
geometric interest to characterize the circumstances under which it 
can happen. We would like to place some kind of bound on how 
much acceleration is needed and how distant E must be. 

As a warm-up, consider the analogous problem in Euclidean 
space, m/2. Here we have the notion of a tubular neighborhood, 
which is the greatest thickening of a curve W such that no point in 
it lies on two different normals. The tubular neighborhood has a ra- 
dius r, which is the greatest possible radius of a non-self-intersecting 
piece of rope whose central axis coincides with W. Normally, as in 
region A, the rope doesn’t intersect itself. There are two qualita- 
tively different reasons why the rope could self-intersect. One is 
local: the radius of curvature of W is too small, as at B, where W 
coincides with a circle of radius r. The other is global: two points 
that are far apart as measured along W could be close together in 
the ambient Euclidean space, as at point C. 

If we carry over these ideas to Minkowski space, then the local 
case, m/3, is easy to analyze using the techniques we have devel- 
oped. The analog of the radius of curvature is the inverse of the 
proper acceleration, which suggests that we should be able to get 
a bound on the radius of the tubular neighborhood in terms of the 
acceleration. Define /(r) = r • v. At a given point on W, / is minus 
the Minkowski time coordinate that an observer whose world-line is 
W would assign, at that instant, to E. The condition for the type 
of self-intersection we’re discussing is that both / and its derivative 
with respect to proper time f vanish at the same point on W. Differ- 
entiating / using the product rule, we find f = v • v + r • a = 1 + r- a 
(in the -| signature), so that r • a = —1. 

We now make use of the fact that both a and r are orthogonal 
to W — the former as a general kinematic fact, and the latter be- 
cause / = 0. This means that they lie in the plane perpendicular 
to W. The geometry of this plane is Euclidean, so we can apply 
the Euclidean inequality |a • r| < |a| |r|, where the bars on the 
left denote the absolute value and the ones on the right the mag- 
nitudes of the vectors. We therefore have |a| I r I > 1. Since r is 
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n/The boundary point B re- 
ally is special. 


orthogonal to W, we can interpret it as the proper distance between 
E and W. The magnitude of a is the proper acceleration. Con- 
verting to units with c / 1, we have an exact bound of the form 
(proper distance) (proper acceleration) > c 2 . In ordinary units c is 
large, so in this sense E must be distant, and the acceleration large. 
This explains why we never encounter such a problem in nonrela- 
tivistic physics. 

That was never now 

So far we have characterized the circumstances under which si- 
multaneity can fail to be unique. Simultaneity can also fail to exist. 
For example, in the same notation, take W to be the constant- 
acceleration motion described in example 4, p. 61, and let E be the 
event (—1,0). Then it can easily be shown (problem 17) that /(r) 
is always positive, so an observer moving along W will always con- 
sider E to be in her past, never her future. No time exists for her 
such that she considers E to be “now.” The function / comes to a 
maximum somewhere but never crosses zero. 

There will always be some neighborhood of W within which we 
are protected against nonexistence of simultaneity. To determine 
the radius of this neighborhood, we consider an event B that lies 
on the boundary of the neighborhood, and define / in terms of B 
rather than E. Then /(r) = 0 for some r, but / does not cross over 
zero, so that either / < 0 everywhere or / > 0 everywhere. At the 
place where /(r) = 0, we also have /'(r) = 0, and the rest of the 
analysis is the same as before. Therefore the radius of the tubular 
neighborhood determined in that example defines a radius within 
which simultaneity has both existence and uniqueness. 

The interpretation of such a boundary point B is a little funny. 
Figure n recapitulates the motion described in example 4. For 
this motion, the only point like B is the one labeled (0, 0) in the 
Minkowski coordinates used in that derivation. Is there really any- 
thing special about this point, or is it just a random point that 
we happened to choose as the origin of our coordinate system? An 
observer moving along this W does not believe that any point in 
spacetime accessible to her has any special properties. She has 
always been accelerating and always will be, so no event she can 
observe or affect can be distinguished from the other events that 
she could have observed or affected in the same way at any earlier 
or later time. But we can easily show that B is special by giving 
a description of it without reference to any coordinates. Let W’s 
causal future be the set of all events that lie in the future light cone 
of some event on W, and similarly for W’s causal past. The bound- 
aries of these two sets are W’s past and future event horizons, and 
these horizons coincide at only one event, which is B. This seems 
paradoxical, but our observer can neither observe nor affect B, so 
there is no contradiction. 
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Problems 


1 Fred buys a ticket on a spaceship that will accelerate to an 
ultrarelativistic speed v such that c — v is only 6 m/s. Fred was 
on the track team in high school, so he knows he can run about 8 
m/s. Once the ship is up to speed, Fred plans to run in the forward 
direction, thereby becoming the first human to exceed the speed of 
light. Other than the possible lack of gravity to allow running, what 
is wrong with Fred’s plan? 

2 (a) In the equation v c = (v\ + V 2 )/(l + V 1 V 2 ) for combination 
of velocities, interpret the case where one of the velocities (but not 
the other) equals the speed of light, (b) Interpret the case where the 
denominator goes to zero, (c) Use the geometric series to rewrite 
the factor 1/(1 + V 1 V 2 ), and then expand the expression for v c as 
a series in v\ and V 2 , retaining terms up to third order in velocity. 
How does this relate to the correspondence principle? 

3 Determine which of the identities in section 3.6 need to be 
modified in order to be valid in units with c / 1, and describe how 
they should be modified. 

4 The Large Hadron Collider accelerates counterrotating beams 
of protons and collides them head-on. The beam energy has been 
gradually increased, and the accelerator is designed to reach a max- 
imum energy of 14 TeV, corresponding to a rapidity of 10.3. (a) 
Find the velocity of the beam, (b) In any collision, the kinetic en- 
ergy available to do something inelastic (smash up your car, produce 
nuclear reactions, . . . ) is the energy in the center of mass frame; 
in any other frame, there is initial kinetic energy that must also be 
present in the final state due to conservation of momentum. Sup- 
pose that a particular proton in the LHC beam never undergoes 
a collision with a proton from the opposite beam, and instead is 
wasted by being dumped into a beamstop. Let’s say that this colli- 
sion is with a proton in a hydrogen atom left behind by someone’s 
fingerprint. Find the velocities of the two protons in their common 
center of mass frame. 

5 Each GPS satellite is in an orbit with a radius of 26,600 
km, with an orbital period of half a sidereal day, giving it a velocity 
of 3.88 km/s. The atomic clock aboard such a satellite is tuned 
to 10.22999999543 MHz, which is chosen so that when the satellite 
is directly overhead, the effect of time dilation (transverse Doppler 
shift), combined with a general-relativistic effect due to gravity, re- 
sults in a frequency of exactly 10.23 MHz. (GPS started out as a 
military project, and legend has it that the top brass, suspicious of 
the crazy relativity stuff, demanded that the satellites be equipped 
with a software switch to turn off the correction, just in case the 
physicists were wrong.) There are oscillations superimposed onto 
these static effects due to the longitudinal Doppler shifts as the 
satellites approach and recede from a given observer on the ground. 
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(a) Calculate the maximum Doppler-shifted frequency for a hypo- 
thetical observer in outer space who is being directly approached by 
the satellite in its orbit, (b) In reality, the greatest possible longi- 
tudinal component of the velocity is considerably smaller than this 
due to the geometry. Use the size of the earth to determine this 
velocity and the corresponding maximum frequency. 

6 Verify directly, using the geometry of figure b/2 on p. 55 that 
for v = 3/5, the Doppler shift factor is D = 2. (Do not simply plug 
v = 3/5 into the formula D = ^{l + v)/(l — v).) 

7 Generalize the numerical calculation of problem 6 to prove 
the general result D = y/(l + v)/(l — v). 

8 Expand the relativistic equation for the longitudinal Doppler 
shift of light D(v) in a Taylor series, and find the first two nonvanish- 
ing terms. Show that these two terms agree with the nonrelativistic 
expression, so that any relativistic effect is of higher order in v. 

9 Prove, as claimed on p. 64, that we must have a • v = 0 if the 
velocity four-vector is to remain properly normalized. 

10 Example 4 on p. 61 described the motion of an object having 
constant proper acceleration a, the world-line being t = - sinh ar 
and x = ^ cosh ar in a particular observer’s Minkowski coordinates. 

(a) Prove the following results for 7 and for the (three-) velocity and 
(three-) acceleration measured by this observer. 

7 = cosh ar 
v = tanh ar 

acceleration = a cosh -3 ar 

Do the calculations simply by taking the first and second derivatives 
of position with respect to time. You will find the following facts 
helpful: 

1 — tanh 2 = cosh -2 

— tanh x = cosh -2 x 
ax 

(b) Interpret the results in the limit of large r. 

11 Example 4 on p. 61 described the motion of an object having 
constant proper acceleration a, the world-line being t = - sinh ar 
and x = - cosh ar in a particular observer’s Minkowski coordinates. 
Find the corresponding velocity and acceleration four- vectors. 

12 Starting from the results of problem 11, repeat problem 10a 
using the techniques of section 3.7 on p. 66. You will find it helpful 
to know that 1 — tanh 2 = cosh -2 . 

13 Let v be a future-directed, properly normalized velocity 

vector. Compare the value of v • v in the H signature used in 

this book with its value in the signature — b ++• 
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14 (a) Prove the relation dy/ (It, = 7 3 a Q • v Q given on p. 67, 
in the special case where the motion is linear, (b) Generalize the 
result to 3 + 1 dimensions. 

15 Derive the identity a Q = ^ ^ [P Q a — (o ■ a)v Q ] on p. 67. 

16 Recapitulating the geometry in figure m on p. 73, let W be a 
smooth, timelike world line, E an event not on W, and r the vector 
from E to a point on W, parametrized by proper time r. Define the 
proper distance i between E and a point on W as £ 2 = — (P v r) 2 , 
where the square indicates an inner product of the vector with itself, 

and the minus sign is because we use the H signature. Show 

that d(f? 2 )/ dr = 2(r • v)(r • a)(v • v), where the final factor is just a 
signature-dependent sign. Does this make sense when W is inertial? 
Give an example where the derivative vanishes because the first 
factor is zero, and another example where the second factor is the 
one that vanishes (but a / 0). 

17 Consider an observer O moving along a world- line W with 
the constant-acceleration motion defined in example 4, p. 61. In 
section 3.9.3, p. 73, we gave the coordinates of a certain event E 
that was never “now” as described by our observer. The purpose 
of this problem is to analyze this is a more elegant and coordinate- 
invariant way. Let P be a point on W, let B be the event described in 
section 3.9.3, and let x = bP, h = BE, and r = eP. (a) Show that 
W, which was originally described in a certain set of coordinates, 
can instead by defined by the fact that x • v = 0 for every point on 
W. (b) Show that if h is timelike, then r • v is never zero. 
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Chapter 4 

Dynamics 

4.1 Ultrarelativistic particles 

A typical 22-caliber rifle shoots a bullet with a mass of about 3 
g at a speed of about 400 m/s. Now consider the firing of such 
a rifle as seen through an ultra-powerful telescope by an alien in 
a distant galaxy. We happen to be firing in the direction away 
from the alien, who gets a view from over our shoulder. Since the 
universe is expanding, our two galaxies are receding from each other. 
In the alien’s frame, our own galaxy is the one that is moving - 
let’s say at 1 c — (200 m/s). If the two velocities simply added, 
the bullet would be moving at c + (200 m/s). But velocities don’t 
simply add and subtract relativistically, and applying the correct 
equation for relativistic combination of velocities, we find that in 
the alien’s frame, the bullet flies at only c— (199.9995 m/s). That is, 
according to the alien, the energy in the gunpowder only succeeded 
in accelerating the bullet by 0.0005 m/s! If we insisted on believing 
in K = (1/2 )mv 2 , this would clearly violate conservation of energy 
in the alien’s frame of reference. It appears that kinetic energy must 
not only rise faster than v 2 as v approaches c, it must blow up to 
infinity. This gives a dynamical explanation for why no material 
object can ever reach or exceed c, as we have already inferred on 
purely kinematical grounds. 

To the alien, both our galaxy and the bullet are ultrarelativistic 
objects, i.e., objects moving at nearly c. A good way of thinking 
about an ultrarelativistic particle is that it’s a particle with a very 
small mass. For example, the subatomic particle called the neutrino 
has a very small mass, thousands of times smaller than that of the 
electron. Neutrinos are emitted in radioactive decay, and because 
the neutrino’s mass is so small, the amount of energy available in 
these decays is always enough to accelerate it to very close to c. 
Nobody has ever succeeded in observing a neutrino that was not 
ultrarelativistic. When a particle’s mass is very small, the mass 
becomes difficult to measure. For almost 70 years after the neu- 
trino was discovered, its mass was thought to be zero. Similarly, we 
currently believe that a ray of light has no mass, but it is always 
possible that its mass will be found to be nonzero at some point 

1 In reality when two velocities move at relativistic speeds compared with one 

another, they are separated by a cosmological distance, and special relativity 
does not actually allow us to construct frames of reference this large. 
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in the future. A ray of light can be modeled as an ultrarelativistic 
particle. 

Let’s compare ultrarelativistic particles with train cars. A single 
car with kinetic energy E has different properties than a train of two 
cars each with kinetic energy E/2. The single car has half the mass 
and a speed that is greater by a factor of \/2. But the same is not 
true for ultrarelativistic particles. Since an idealized ultrarelativistic 
particle has a mass too small to be detectable in any experiment, 
we can’t detect the difference between m and 2m. Furthermore, 
ultrarelativistic particles move at close to c, so there is no observable 
difference in speed. Thus we expect that a single ultrarelativistic 
particle with energy E compared with two such particles, each with 
energy E/2, should have all the same properties as measured by a 
mechanical detector. 

An idealized zero-mass particle also has no frame in which it 
can be at rest. It always travels at c, and no matter how fast we 
chase after it, we can never catch up. We can, however, observe 
it in different frames of reference, and we will find that its energy 
is different. For example, distant galaxies are receding from us at 
substantial fractions of c, and when we observe them through a 
telescope, they appear very dim not just because they are very far 
away but also because their light has less energy in our frame than 
in a frame at rest relative to the source. This effect must be such 
that changing frames of reference according to a specific Lorentz 
transformation always changes the energy of the particle by a fixed 
factor, regardless of the particle’s original energy; for if not, then 
the effect of a Lorentz transformation on a single particle of energy 
E would be different from its effect on two particles of energy E/2. 

How does this energy-shift, factor depend on the velocity v of 
the Lorentz transformation? Here it becomes nicer to work in 
terms of the variable D. Let’s write f(D ) for the energy-shift 
factor that results from a given Lorentz transformation. Since a 
Lorentz transformation D\ followed by a second transformation D 2 
is equivalent to a single transformation by D 1 D 2 , we must have 
f{D\D- 2 ) = f (Di) f (D 2 ) ■ This tightly constrains the form of the 
function /; it must be something like f{D ) = D n , where n is a con- 
stant. The interpretation of n is that under a Lorentz transforma- 
tion corresponding to 1% of c, energies of ultrarelativistic particles 
change by about n% (making the approximation that v = .01 gives 
D ~ 1.01). In his original 1905 paper on special relativity, Einstein 
used Maxwell’s equations and the Lorentz transformation to show 
that for a light wave n = 1, and we will prove on p. 88 that this 
holds for any ultrarelativistic object. He wrote, “It is remarkable 
that the energy and the frequency . . . vary with the state of motion 
of the observer in accordance with the same law.” He was presum- 
ably interested in this fact because 1905 was also the year in which 
he published his paper on the photoelectric effect, which formed the 
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foundations of quantum mechanics. An axiom of quantum mechan- 
ics is that the energy and frequency of any particle are related by 
E = hf , and if E and / hadn’t transformed in the same way rela- 
tivistically, then quantum mechanics would have been incompatible 
with relativity. 

If we assume that certain objects, such as light rays, are truly 
massless, rather than just having masses too small to be detectable, 
then their D doesn’t have any finite value, but we can still find how 
the energy differs according to different observers by finding the D 
of the Lorentz transformation between the two observers’ frames of 
reference. 

An astronomical energy shift Example 1 

> For quantum-mechanical reasons, a hydrogen atom can only 
exist in states with certain specific energies. By conservation 
of energy, the atom can therefore only absorb or emit light that 
has an energy equal to the difference between two such atomic 
energies. The outer atmosphere of a star is mostly made of 
monoatomic hydrogen, and one of the energies that a hydrogen 
atom can absorb or emit is 3.0276 x 10~ 19 J. When we observe 
light from stars in the Andromeda Galaxy, it has an energy of 
3.0306 x 10~ 19 J. If this is assumed to be due entirely to the 
motion of the Milky Way and Andromeda Galaxy relative to one 
another, along the line connecting them, find the direction and 
magnitude of this velocity. 

o The energy is shifted upward, which means that the Andromeda 
Galaxy is moving toward us. (Galaxies at cosmological distances 
are always observed to be receding from one another, but this 
doesn’t necessarily hold for galaxies as close as these.) Relating 
the energy shift to the velocity, we have 


^ = D=V( 1 + i /)/(1 - l /). 

Since the shift is only about one part per thousand, the velocity 
is small compared to c — or small compared to 1 in units where 
c = 1. Therefore we can employ the low-velocity approximation 
D « 1 + v, which gives 


^D-1 = |-1=-1.0x 1(T 3 . 

The negative sign confirms that the source is approaching rather 
than receding. This is in units where c = 1. Converting to SI 
units, where c ^ 1, we have v = (-1.0 x 10~ 3 )c = -300 km/s. 
Although the Andromeda Galaxy’s tangential motion is not accu- 
rately known, it is considered likely that it will collide with the Milky 
Way in a few billion years. 


Section 4.1 Ultrarelativistic particles 
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4.2 E=mc 2 


We now know the relativistic expression for kinetic energy in the lim- 
iting case of an ultrarelativistic particle: its energy is proportional to 
the “stretch factor” D of the Lorentz transformation. What about 
intermediate cases, like v = c/21 


a / The match is lit inside the bell 
jar. It burns, and energy escapes 
from the jar in the form of light. Af- 
ter it stops burning, all the same 
atoms are still in the jar: none 
have entered or escaped. The fig- 
ure shows the outcome expected 
before relativity, which was that 
the mass measured on the bal- 
ance would remain exactly the 
same. This is not what happens 
in reality. 

When we are forced to tinker with a time-honored theory, our 
first instinct should always be to tinker as conservatively as possible. 
Although we’ve been forced to admit that kinetic energy doesn’t 
vary as v 2 /2 at relativistic speeds, the next most conservative thing 
we could do would be to assume that the only change necessary 
is to replace the factor of v 2 / 2 in the nonrelativistic expression for 
kinetic energy with some other function, which would have to act 
like D or 1 /D for v — > ±c. I suspect that this is what Einstein 
thought when he completed his original paper on relativity in 1905, 
because it wasn’t until later that year that he published a second 
paper showing that this still wasn’t enough of a change to produce 
a working theory. We now know that there is something more that 
needs to be changed about prerelativistic physics, and this is the 
assumption that mass is only a property of material particles such 
as atoms (figure a). Call this the “atoms-only hypothesis.” 

Now that we know the correct relativistic way of finding the 
energy of a ray of light, it turns out that we can use that to find 
what we were originally seeking, which was the energy of a material 
object. The following discussion closely follows Einstein’s. 

Suppose that a material object O of mass m Q , initially at rest 
in a certain frame A, emits two rays of light (or any other kind of 
ultrarelativistic particles), each with energy E/2. By conservation 
of energy, the object must have lost an amount of energy equal to 
E. By symmetry, O remains at rest. 

We now switch to a different frame of reference B moving at some 
arbitrary speed corresponding to a stretch factor D. The change 
of frames means that we’re chasing one ray, so that its energy is 
scaled down to (E/2)D~ 1 , while running away from the other, whose 
energy gets boosted to ( E/2)D . In frame B, as in A, O retains the 
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same speed after emission of the light. But observers in frames A 
and B disagree on how much energy O has lost, the discrepancy 
being 


E 


l -(D + D~ l )-l 


This can be rewritten using identity [2] from section 3.6 as 


E{ 7-1). 


Let’s consider the case where B’s velocity relative to A is small. 
Using the approximation 7 ~ 1 + v 2 /2, our result is approximately 


neglecting terms of order v 4 and higher. The interpretation is that 
when O reduced its energy by E in order to make the light rays, it 
reduced its mass from m Q to m 0 — m, where m = E. Inserting the 
necessary factor of c 2 to make this valid in units where c / 1, we 
have Einstein’s famous 

E = me 2 . 


This derivation entailed both an approximation and some hidden 
assumptions. These issues are explored more thoroughly in section 
4.4 on p. 98 and in ch. 9 on p. 175. The result turns out to be valid 
for any isolated body. 

We find that mass is not simply a built-in property of the parti- 
cles that make up an object, with the object’s mass being the sum of 
the masses of its particles. Rather, mass and energy are equivalent, 
so that if the experiment of figure a is carried out with a sufficiently 
precise balance, the reading will drop because of the mass equivalent 
of the energy emitted as light. 

The equation E = me 2 tells us how much energy is equivalent 
to how much mass: the conversion factor is the square of the speed 
of light, c. Since c a big number, you get a really really big number 
when you multiply it by itself to get c 2 . This means that even 
a small amount of mass is equivalent to a very large amount of 
energy. Conversely, an ordinary amount of energy corresponds to 
an extremely small mass, and this is why nobody detected the non- 
null result of experiments like the one in figure a hundreds of years 
ago. 

The big event here is mass-energy equivalence, but we can also 
harvest a result for the energy of a material particle moving at a 
certain speed. We have m ( 7 — 1) for the difference between O’s 
energy in frame B and its energy when it is at rest, i.e. , its kinetic 
energy. But since mass and energy are equivalent, we assign O an 
energy m when it is at rest. The result is that the energy is 


E = mq 

(or 777.7 c 2 in units with c / 1). 


E=mc 2 


Section 4.2 
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b/Top: A PET scanner. Middle: 
Each positron annihilates with an 
electron, producing two gamma- 
rays that fly off back-to-back. 
When two gamma rays are ob- 
served simultaneously in the ring 
of detectors, they are assumed to 
come from the same annihilation 
event, and the point at which they 
were emitted must lie on the line 
connecting the two detectors. 
Bottom: A scan of a person’s 
torso. The body has concentrated 
the radioactive tracer around the 
stomach, indicating an abnormal 
medical condition. 


Electron-positron annihilation Example 2 

Natural radioactivity in the earth produces positrons, which are 
like electrons but have the opposite charge. A form of antimat- 
ter, positrons annihilate with electrons to produce gamma rays, a 
form of high-frequency light. Such a process would have been 
considered impossible before Einstein, because conservation of 
mass and energy were believed to be separate principles, and 
this process eliminates 100% of the original mass. The amount 
of energy produced by annihilating 1 kg of matter with 1 kg of 
antimatter is 


E = me 2 

= (2 kg) (3.0 x 10 8 m/s) 2 
= 2 x 1 0 17 J, 


which is on the same order of magnitude as a day’s energy con- 
sumption for the entire world’s population! 

Positron annihilation forms the basis for the medical imaging tech- 
nique called a PET (positron emission tomography) scan, in which 
a positron-emitting chemical is injected into the patient and map- 
ped by the emission of gamma rays from the parts of the body 
where it accumulates. 

A rusting nail Example 3 

> An iron nail is left in a cup of water until it turns entirely to rust. 
The energy released is about 0.5 MJ. In theory, would a suffi- 
ciently precise scale register a change in mass? If so, how much? 

> The energy will appear as heat, which will be lost to the envi- 
ronment. The total mass-energy of the cup, water, and iron will 
indeed be lessened by 0.5 MJ. (If it had been perfectly insulated, 
there would have been no change, since the heat energy would 
have been trapped in the cup.) The speed of light is c = 3 x 10 8 
meters per second, so converting to mass units, we have 


_ E 
~ c 2 

0.5 x 10 6 J 
(3 x 10 8 m/s) 2 
= 6 x 10~ 12 kilograms. 


The change in mass is too small to measure with any practical 
technique. This is because the square of the speed of light is 
such a large number. 
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Relativistic kinetic energy Example 4 

By about 1930, particle accelerators had progressed to the point 
at which relativistic effects were routinely taken into account. In 
1964, W. Bertozzi did a special-purpose experiment to test the 
predictions of relativity using an electron accelerator. The results 
were discussed in less detail in example 2 on p. 58, at which point 
we had not yet seen the relativistic equation for kinetic energy. 
Electrons were accelerated through a static electric potential dif- 
ference V to a variety of kinetic energies K = e V, and their veloc- 
ities inferred by measuring their time of flight through a beamline 
of length i = 8.4 m. Electrical pulses were recorded on an os- 
cilloscope at the beginning and end of the time of flight t. The 
energies were confirmed by calorimetry. Figure c shows a sam- 
ple photograph of an oscilloscope trace at V = 1 .5 MeV. 


c / Example 4. Each horizontal di- 
vision is 9.8 ns. 


12 ns, Newtonian mechanics 
J 29 ns, special relativity 


The prediction of Newtonian physics is as follows. 

el/ = (1/2 )mv 2 
v/c = 2.4 
t = 12 ns 

According to special relativity, we have: 
eV = m { y - 1)c 2 



t = 29 ns 


The results contradict the Newtonian prediction and are consis- 
tent with special relativity. According to Newton, this amount of 
energy should have accelerated the electrons to several times 
the speed of light. In reality, we see a clear demonstration of the 
nature of c as a limiting velocity. 



E=mc 2 
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LIGHTS ALL ASKEW 
IN THE HEAVENS 

Men of Science More or Less 
Agog Over Results of Eclipse 
Observations. 


EINSTEIN THEORY TRIUMPHS 


Stars Not Where They Seemed 
or Were Calculated to be, 
but Nobody Need Worry. 


A BOOK FOR 12 WISE MEN 


No More in All the World Could 
Comprehend It. Said Einstein When 
HI* Oaring Publishers Accepted It. 


Gravity bending light Example 5 

Gravity is a universal attraction between things that have mass, 
and since the energy in a beam of light is equivalent to some 
very small amount of mass, light should be affected by gravity, 
although the effect should be very small. The first experimental 
confirmation of relativity came in 1919 when stars next to the sun 
during a solar eclipse were observed to have shifted a little from 
their ordinary position. (If there was no eclipse, the glare of the 
sun would prevent the stars from being observed.) Starlight had 
been deflected by the sun’s gravity. The figure is a photographic 
negative, so the circle that appears bright is actually the dark face 
of the moon, and the dark area is really the bright corona of the 
sun. The stars, marked by lines above and below then, appeared 
at positions slightly different than their normal ones. 

Keep in mind that these arguments are very rough and qualita- 
tive, and it is not possible to produce a relativistic theory of gravity 
simply by taking E = me 2 and combining it with Newton’s law of 
gravity. After all, this law doesn’t refer to time at all: it predicts 
that gravitational forces propagate instantaneously. We know this 
can’t be consistent with relativity, which forbids cause and effect 
from propagating at any speed greater than c. To produce a rela- 
tivistic theory of gravity, we need general relativity. 

Similar reasoning suggests that there may be stars — black holes 
— so dense that their gravity can prevent light from leaving. Such 
stars have been detected, and their properties seem so far to be 
described correctly by general relativity. 
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4.3 Relativistic momentum 

4.3.1 The energy-momentum vector 

Newtonian mechanics has two different measures of motion, ki- 
netic energy and momentum, and the relationship between them is 
nonlinear. Doubling your car’s momentum quadruples its kinetic 
energy. 

But nonrelativistic mechanics can’t handle massless particles, 
which are always ultrarelativistic. We saw in section 4.1 that ul- 
trarelativistic particles are “generic,” in the sense that they have 
no individual mechanical properties other than an energy and a 
direction of motion. Therefore the relationship between kinetic en- 
ergy and momentum must be linear for ultrarelativistic particles. 
For example, doubling the amplitude of an electromagnetic wave 
quadruples both its energy density, which depends on E 2 and B 2 , 
and its momentum density, which goes like E x B. 

How can we make sense of these energy-momentum relation- 
ships, which seem to take on two completely different forms in the 
limiting cases of very low and very high velocities? 

The first step is realize that since mass and energy are equivalent, 
we will get more of an apples-to-apples comparison if we stop talking 
about a material object’s kinetic energy and consider instead its total 
energy E, which includes a contribution from its mass. 

Figure d is a graph of energy versus momentum. In this repre- 
sentation, massless particles, which have E oc \p \ , lie on two diagonal 
lines that connect at the origin. If we like, we can pick units such 
that the slopes of these lines are plus and minus one. Material par- 
ticles lie above these lines. For example, a car sitting in a parking 
lot has p = 0 and E = m. 

Now what happens to such a graph when we change to a dif- 
ferent frame or reference that is in motion relative to the original 
frame? A massless particle still has to act like a massless particle, 
so the diagonals are simply stretched or contracted along their own 
lengths. A transformation that always takes a line to a line is a 
linear transformation, and if the transformation between different 
frames of reference preserves the linearity of the lines p = E and 
p = —E, then it’s natural to suspect that it is actually some kind of 
linear transformation. In fact the transformation must be linear, be- 
cause conservation of energy and momentum involve addition, and 
we need these laws to be valid in all frames of reference. But now 
by the same reasoning as in subsection 1.3.1 on p. 22, the trans- 
formation must be area-preserving. We then have the same three 
cases to consider as in figure j on p. 16. The “Galilean” version is 
ruled out because it would imply that particles keep the same en- 
ergy when we change frames. (This is what would happen if c were 
infinite, so that the mass-equivalent E/c 2 of a given energy was zero, 


E 



d / In the p-E plane, mass- 
less particles lie on the two 
diagonals, while particles with 
mass lie to the right. 
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and therefore E would be interpreted purely as the mass.) Nor can 
the “rotational” version be right, because it doesn’t preserve the 
E = \p\ diagonals. We are left with the third case, which establishes 
the following aesthetically appealing fact: 


Energy-momentum is a four-vector 

Let an isolated object have momentum and mass-energy p and E. 
Then the p-E plane transforms according to exactly the same kind 
of Lorentz transformation as the x-t plane. That is, (. E,p x ,p y ,p z ) 
is a four-dimensional vector just like (t, x, y, z ). 


This is a highly desirable result. If it were not true, it would be 
like having to learn different mathematical rules for different kinds 
of three-vectors in Newtonian mechanics. 

The only remaining issue to settle is whether the choice of units 
that gives invariant 45-degree diagonals in the x-t plane is the same 
as the choice of units that gives such diagonals in the p-E plane. 
That is, we need to establish that the c that applies to x and t is 
equal to the c' needed for p and E. i.e., that the velocity scales of the 
two graphs are matched up. This is true because in the Newtonian 
limit, the total mass-energy E is essentially just the particle’s mass, 
and then p/E ~ p/m ~ v. This establishes that the velocity scales 
are matched at small velocities, which implies that they coincide for 
all velocities, since a large velocity, even one approaching c, can be 
built up from many small increments. (This also establishes that 
the exponent n defined on p. 80 equals 1 as claimed.) 

Suppose that a particle is at rest. Then it has p = 0 and mass- 
energy E equal to its mass rn. Therefore the inner product of its 
(. E,p ) four- vector with itself equals m 2 . In other words, the “mag- 
nitude” of the energy-momentum four- vector is simply equal to the 
particle’s mass. If we transform into a different frame of reference, 
in which p ^ 0, the inner product stays the same. In symbols, 

m = E — p , 


or, in units with 1, 


(me 2 ) 2 = E 2 — (pc) 2 . 

We take this as the relativistic definition of mass. Since the defi- 
nition is an inner product, which is a scalar, it is the same in all 
frames of reference. (Some older books use an obsolete convention 
of referring to my as “mass” and m as “rest mass.”) 

self-check A 

Interpret the equation m 2 = E 2 - p 2 in the case where m = 0. > 

Answer, p. ?? 
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Results from Meyer et al . , 1 963 

v Y 

0.9870 1.0002(5) 

0.9881 1.0012(5) 

0.9900 0.9998(5) 


which according to special relativity should equal 1. Their results, 
tabulated in the sidebar, show excellent agreement with theory. 

Mass of two light waves Example 6 

Let the momentum of a certain light wave be ( p t ,p x ) = (E,E), 
and let another such wave have momentum (E, -E). The total 
momentum is (2£, 0). Thus this pair of massless particles has a 
collective mass of 2 E. This is an example of the non-additivity of 
relativistic mass. 

4.3.2 Collision invariants 

Example 6 shows that mass is not additive, nor it is a measure 
of the “quantity of matter.” More generally, suppose that we have a 
collision between two objects, which could be two cars or two nuclei 
in a particle accelerator. Conservation of (spatial) momentum dic- 
tates that not all the energy is available for smashing windshields or 
creating gamma rays. For example, a Martian watching a parking- 
lot fender-bender through a powerful telescope would say that both 
cars were going as fast as fighter jets, due to the rotation of the 
earth, but this doesn’t make the bang any louder. To avoid being 
misled by these frame-dependent distractions, we can concentrate 
only on quantities that are scalars. For a two-body collision, there 
are three such scalars that we can construct: p 2 , p|, and pi • p 2 . 
(The notation a 2 is simply an abbreviation for a • a.) These are 
known as the collision invariants. The first two of these are simply 
the squared masses of the individual particles. 

Now consider the center of mass frame, i.e., the frame in which 
the total momentum has a zero spacelike part. In this frame, the 
total energy-momentum vector is of the form (E cm ,0), correspond- 
ing to a mass M = E c m . All of this energy is available to make 
a bang. If we were colliding particles in an accelerator in order to 
produce new particles, this collision would be just barely enough 
energy to create a single particle of mass M, if the two incoming 
particles were annihilated in the process. This center of mass en- 
ergy can be expressed in terms of the collision invariants as M 2 = 
(pi + P 2) 2 = Pi + p| + 2 Pi • P 2 = rn\ + m\ + 2pi • p 2 . This is a 
nonlinear relationship, and the third collision invariant pi • p 2 tells 
us how the nonlinearity plays out based on the relative directions of 
motion. The two momentum vectors are both timelike and future- 
directed, so by the reversed triangle equality (section 1.5, p. 36) we 
have M > + m 2 . 


A high-precision test of this fundamental relativistic relationship 
was carried out by Meyer et al. in 1963 by studying the motion of 
electrons in static electric and magnetic fields. They define the 
quantity 


m 2 + p 2 ’ 
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4.3.3 Some examples involving momentum 

Finding velocity given energy and momentum Example 7 

> If we know that a particle has mass-energy E and momentum p 
(which also implies knowledge of its mass m), what is its velocity? 

> In the particle’s rest frame it has a world-line that points straight 
up on a spacetime diagram, and its momentum vector p likewise 
points up in the p - E plane. Since displacement vectors and 
momentum vectors transform according to the same rules, this 
parallelism will be maintained in other frames as well. Therefore 
in an arbitrarily chosen frame, the vector p = (E, p) lies along a 
line whose inverse slope v = p/E gives the velocity. 

As a check on our result, we look at its limiting behavior. In the 
Newtonian limit, the mass-energy E is nearly all due to the mass, 
so we have v « p/m, the Newtonian result. In the opposite limit 
of ultrarelativistic motion, with E > m, the definition of mass 
m 2 = E 2 - p 2 gives E « \p\, and we have |v| « 1, which is 
also correct. 

Light rays don’t interact Example 8 

We observe that when two rays of light cross paths, they continue 
through one another without bouncing like material objects. This 
behavior follows directly from conservation of energy-momentum. 

Any two vectors can be contained in a single plane, so we can 
choose our coordinates so that both rays have vanishing p z . By 
choosing the state of motion of our coordinate system appropri- 
ately, we can also make p y = 0, so that the collision takes place 
along a single line parallel to the x axis. Since only p x is nonzero, 
we write it simply as p. In the resulting p-E plane, there are two 
possibilities: either the rays both lie along the same diagonal, or 
they lie along different diagonals. If they lie along the same di- 
agonal, then there can’t be a collision, because the two rays are 
both moving in the same direction at the same speed c, and the 
trailing one will never catch up with the leading one. 

Now suppose they lie along different diagonals. We add their 
energy-momentum vectors to get their total energy-momentum, 
which will lie in the gray area of figure d. That is, a pair of light 
rays taken as a single system act sort of like a material object 
with a nonzero mass. By a Lorentz transformation, we can al- 
ways find a frame in which this total energy-momentum vector 
lies along the E axis. This is a frame in which the momenta of the 
two rays cancel, and we have a symmetric head-on collision be- 
tween two rays of equal energy. It is the “center-of-mass” frame, 
although neither object has any mass on an individual basis. For 
convenience, let’s assume that the x-y-z coordinate system was 
chosen so that its origin was at rest in this frame. 

Since the collision occurs along the x axis, by symmetry it is not 
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possible for the rays after the collision to depart from the x axis; 
for if they did, then there would be nothing to determine the ori- 
entation of the plane in which they emerged. 2 Therefore we are 
justified in continuing to use the same p x -E plane to analyze the 
four-vectors of the rays after the collision. 

Let each ray have energy E in the frame described above. Given 
this total energy-momentum vector, how can we cook up two 
energy-momentum vectors for the final state such that energy and 
momentum will have been conserved? Since there is zero total 
momentum, our only choice is two light rays, one with energy- 
momentum vector (E, E) and one with (E, -£). But this is exactly 
the same as our initial state, except that we can arbitrarily choose 
the roles of the two rays to have been interchanged. Such an in- 
terchanging is only a matter of labeling, so there is no observable 
sense in which the rays have collided. 3 

Compton scattering Example 9 

Figure e/1 is a histogram of gamma rays emitted by a 137 Cs 
source and recorded by a Nal scintillation detector. This type 
of detector, unlike a Geiger-Muller counter, gives a pulse whose 
height is proportional to the energy of the radiation. About half the 
gamma rays do what we would like them to do in a detector: they 
deposit their full energy of 662 keV in the detector, resulting in a 
prominent peak in the histogram. The other half, however, inter- 
act through a process called Compton scattering, in which they 
collide with one of the electrons but emerge from the collision 
still retaining some of their energy, with which they may escape 
from the detector. The amount of energy deposited in the detec- 
tor depends solely on the billiard-ball kinematics of the collision, 
and can be determined from conservation of energy-momentum 
based on the scattering angle. Forward scattering at 0 degrees 
is no interaction at all, and deposits no energy, while scattering 


2 In quantum mechanics, there is a loophole here. Quantum mechanics allows 
certain kinds of randomness, so that the symmetry can be broken by letting the 
outgoing rays be observed in a plane with some random orientation. 

3 There is a second loophole here, which is that a ray of light is actually a 
wave, and a wave has other properties besides energy and momentum. It has 
a wavelength, and some waves also have a property called polarization. As a 
mechanical analogy for polarization, consider a rope stretched taut. Side-to-side 
vibrations can propagate along the rope, and these vibrations can occur in any 
plane that coincides with the rope. The orientation of this plane is referred to 
as the polarization of the wave. Returning to the case of the colliding light rays, 
it is possible to have nontrivial collisions in the sense that the rays could affect 
one another’s wavelengths and polarizations. Although this doesn’t actually 
happen with non-quantum-mechanical light waves, it can happen with other 
types of waves; see, e.g., Hu et al., arxiv.org/abs/hep-ph/9502276, figure 2. 
The title of example 8 is only valid if a “ray” is taken to be something that lacks 
wave structure. The wave nature of light is not evident in everyday life from 
observations with apparatus such as flashlights, mirrors, and eyeglasses, so we 
expect the result to hold under those circumstances, and it does. E.g., flashlight 
beams do pass through one anther without interacting. 
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at 180 degrees deposits the maximum energy possible if the only 
interaction inside the detector is a single Compton scattering. We 
will analyze the 180-degree scattering, since it can be tackled in 
1+1 dimensions. 

e / 1 . The Compton edge lies at 
the energy deposited by gamma 
rays that scatter at 180 degrees 
from an electron. 2. The colli- 
sion in the lab frame. 3. The 
same collision in the center of 
mass frame. 




Figure e/2 shows the collision in the lab frame, where the elec- 
tron is initially at rest. As is conventional in this type of diagram, 
the world-line of the photon is shown as a wiggly line; the wig- 
gles are just a decoration, and the actual world-line consists of 
two line segments. The photon enters the detector with the full 
energy E 0 = 662 keV and leaves with a smaller energy E f . The 
difference E 0 - E f is what the detector will measure, contributing 
a count to the Compton edge. In the lab frame, the total initial 
momentum vector is p = (E 0 + m, E 0 ), with the timelike compo- 
nent representing the total mass-energy. Because the photon is 
massless, its momentum p x = E 0 is equal to its energy. 

Let v be the velocity of the center-of-mass frame, e/3, relative to 
the lab frame. Using the result of example 7, we find v = E 0 /(E 0 + 
m). To make the writing easier we define a = E 0 /m, so that 
v = ol/ (1 + a). 

The transformation from the lab frame to the c.m. frame Doppler 
shifts the energy of the incident photon down to E' = D(-v)E 0 . 
The collision reverses the spatial part of the photon’s energy- 
momentum vector while leaving its energy the same. Transfor- 
mation back into the lab frame gives E f = D{-v)E' = D(-v) 2 E 0 = 
E 0 /( 1 + 2a). (This can also be rewritten using the quantum- 
mechanical relation E = hc/A to give the compact form \ f - A 0 = 
2hc/m.) The final result for the energy of the Compton edge is 

p _ p _ Eg 

1 + 1 /2a 
= 478 keV, 

in good agreement with figure e/1 . 

Pair production requires matter Example 1 0 

Example 2 on p. 84 discussed the annihilation of an electron and 
a positron into two gamma rays, which is an example of turning 
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matter into pure energy. An opposite example is pair production, 
a process in which a gamma ray disappears, and its energy goes 
into creating an electron and a positron. 

Pair production cannot happen in a vacuum. For example, gamma 
rays from distant black holes can travel through empty space for 
thousands of years before being detected on earth, and they don’t 
turn into electron-positron pairs before they can get here. Pair 
production can only happen in the presence of matter. When 
lead is used as shielding against gamma rays, one of the ways 
the gamma rays can be stopped in the lead is by undergoing pair 
production. 

To see why pair production is forbidden in a vacuum, consider the 
process in the frame of reference in which the electron-positron 
pair has zero total momentum. In this frame, the gamma ray 
would have to have had zero momentum, but a gamma ray with 
zero momentum must have zero energy as well. This means 
that conservation of the momentum vector has been violated: the 
timelike component of the momentum is the mass-energy, and it 
has increased from 0 in the initial state to at least 2 me 2 in the final 
state. 

4.3.4 Massless particles travel at c 

Massless particles always travel at c(= 1). For suppose that a 
massless particle had |u| < 1 in the frame of some observer. Then 
some other observer could be at rest relative to the particle. In 
such a frame, the particle’s momentum p is zero by symmetry, since 
there is no preferred direction for it. Then E 2 = p 2 + m 2 is zero 
as well, so the particle’s entire energy-momentum vector is zero. 
But a vector that vanishes in one frame also vanishes in every other 
frame. That means we’re talking about a particle that can’t undergo 
scattering, emission, or absorption, and is therefore undetectable by 
any experiment. This is physically unacceptable because we don’t 
consider phenomena (e.g., invisible fairies) to be of physical interest 
if they are undetectable even in principle. 

What about the case of a material particle, i.e., one having mass? 
Since we already have an equation E = my for the energy of a ma- 
terial particle in terms of its velocity, we can find a similar equation 
for the momentum, 

p = \J E 2 — m 2 


= my/ y 2 — 1 



= m'yv 


(a relation that is useful in its own right, and has been verified 
experimentally, f). As a material particle gets closer and closer to 



f / Two early high-precision 
tests of the relativistic equation 
p = rrryv for the momentum of 
a material particle. Graphing 
p/m rather than p allows the 
data for electrons and protons to 
be placed on the same graph. 
The very small error bars for 
the data point from Zrelov are 
represented by the height of the 
black rectangle. 
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c, its momentum approaches infinity, so that an infinite force would 
be required in order to reach c. 

In summary, massless particles always move at v = c, while 
massive ones always move at v < c. 

Note that the equation p = m.'yv isn’t general enough to serve as 
a definition of momentum, since it becomes an indeterminate form 
in the limit m 0. 

No half-life for massless particles Example 1 1 

When we describe an unstable nucleus or other particle as hav- 
ing some half-life, we mean its half-life in its own rest frame. A 
massless particle always moves at c and therefore has no rest 
frame (section 3.4), so it doesn’t make sense to describe it as 
having a half-life in this sense. This is almost, but not quite, the 
same thing as saying that massless particles can never decay. 4 

Constraints on polarization Example 12 

We observe that electromagnetic waves are always polarized 
transversely, never longitudinally. Such a constraint can only ap- 
ply to a wave that propagates at c. If it applied to a wave that 
propagated at less than c, we could move into a frame of refer- 
ence in which the wave was at rest. In this frame, all directions in 
space would be equivalent, and there would be no way to decide 
which directions of polarization should be permitted. 

4.3.5 Evidence as to which particles are massless 

Which of the fundamental particles are massless, and which are 
not? This is can only be determined empirically, and we have at 
least one example, the neutrino, that was formerly thought to be 
massless but is now believed to be massive. For more about the 
neutrino, see section 4.7.2, p. 109. In the present section we discuss 
bounds on the masses of the photon and the graviton. 5 We omit 
a discussion of the gluon, which would be complicated by the fact 
that the gluon is never observed as a free particle or as a classical 
field. This section can be skipped without loss of continuity. 

Some readers may exclaim at this point that of course photons 
must be massless, because light has to travel at the speed of light. 
But it should be clear from the foregoing presentation that the c 
in relativity is not to be interpreted as the speed of light, but as a 


4 See Fiore and Modanese, arxiv. org/abs/hep-th/9508018, 
and http : //physics . stackexchange . com/questions/ 12488/ 

decay-of-massless-particles. If such a process does exist, then Lorentz 
invariance requires that its time-scale be proportional to the particle’s energy. 
It can be argued that gluons, which are massless, do in fact undergo decay into 
less energetic gluons, but the interpretation is ambiguous because we never 
observe gluons as free particles, so we can’t just capture one in a box and watch 
it rattle around inside until it decays. 

’For an in-depth review of this topic, see Goldhaber and Nieto, “Photon and 
Graviton Mass Limits,” http : // arxiv . org/ abs/ 0809 . 1003. 
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kind of conversion factor between space and time. If photons have a 
small but nonvanishing mass, relativity does not have a stake driven 
through its heart. 

If we want to test whether the photon is massless, the most 
straightforward technique would seem to be to measure its time of 
flight as it travels some distance, and see if it goes slower than c. 
There is a difficulty here because our methods for measuring large 
distances, e.g., GPS, generally assume that light travels at c. How- 
ever, if the photon has some mass, then its velocity should depend 
on its energy, so we can instead test whether the speed of a photon 
depends on its energy. From quantum mechanics, this is related 
to its frequency by E = hf, so we are essentially testing whether 
the speed of light in a vacuum depends on frequency. Presently 
the best experimental tests of the invariance of the speed of light 
with respect to wavelength come from astronomical observations 
of gamma-ray bursts, which are sudden outpourings of high-energy 
photons, believed to originate from a supernova explosion in another 
galaxy. One such observation, in 2009, 6 collected photons from such 
a burst, with a duration of 2 seconds, indicating that the propaga- 
tion time of all the photons differed by no more than 2 seconds out 
of a total time in flight on the order of ten billion years, or about 
one part in 10 17 ! 

It turns out, however, that the limits on the mass of the pho- 
ton imposed by time of flight measurements can be improved on by 
many orders of magnitude using other methods. In the standard 
model of particle physics, forces are transmitted by the exchange of 
particles. We’ll concentrate here on static forces. An electrostatic 
force is transmitted by the exchange of photons, and a static grav- 
itational force by the exchange of gravitons. Gravity is not part of 
the standard model of particle physics, and individual gravitons can- 
not be directly detected by any foreseeable technology,' but there 
are fundamental reasons for believing that they must exist, and in 
any case our discussion is mathematically identical for gravity and 
electromagnetism. We will therefore discuss electromagnetic fields 
and then note the corresponding results for gravity. 

If we imagine the field surrounding a stationary point charge as a 
swarm of photons, then the first question that occurs to us is what 
is the source of the energy needed in order to create them. The 
standard hand-waving argument is as follows. In addition to the 
usual momentum-energy form of the Heisenberg uncertainty princi- 
ple ApAx > h, there is an energy-time form AEAt > h. This looks 
obvious by analogy when we consider that relativistically, energy 
and momentum are different parts of the energy-momentum four- 
vector, and likewise for time and position. We can interpret this 

f> http : //arxiv. org/abs/0908 . 1832 

7 Rothman and Boughn, “Can Gravitons Be Detected?,” http://arxiv.org/ 
abs/gr-qc/0601043 



g / An artist’s conception of a 
gamma-ray burst, resulting from 
a supernova explosion. 
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2 


3 


h/1. Field lines of a point 
charge. Observations within the 
small region indicated by a box 
allow one to determine how far 
away the charge is. 2. Field 
lines of an infinite capacitor plate, 
according to standard electro- 
magnetism. Observations within 
the box do not give information 
about how far away the charge is. 
3. A violation of Gauss’s law. 


to mean that it is possible, for short periods of time, to cheat the 
law of conservation of energy. We can steal a little energy but then 
pay it back immediately, as long as the duration of the loan is no 
more than about t ~ h/E. During this time, a virtual particle can 
travel a distance of no more than ~ hc/E. Now for a massless par- 
ticle, this energy can be as small as desired, so the force can reach 
to arbitrarily large distances. But for a massive particle, we have 
the relativistic relation E 2 — p 2 = m 2 , which requires E > m, or 
E > me 2 in SI units. This minimum energy corresponds to a maxi- 
mum range ~ h/mc. In general, we expect that the field carried by 
a massive particle will fall off more quickly with distance than the 
field of a massless particle, and we expect that this fall-off will be 
parametrized somehow by a length scale h/mc. 

How would we expect this to play out in the classical theory 
of electromagnetic fields? Consider a point charge, figure h/1. Its 
field lines are straight, and they spread out in all directions, so by 
observations of any region of space, we can trace the lines backward 
to see where they would have intersected. That is how far it is from 
our region of space to the charge. This is a kind of parallax mea- 
surement. In the case of gravity, this is exactly what Eratosthenes 
did in order to measure the radius of the earth. 

But now let’s consider the case of an infinite, plane capacitor 
plate with some charge on it, h/2. The field lines don’t spread, 
so the parallax method doesn’t work. If we examine the field in 
some small region of space, there should be no way to determine the 
distance to the capacitor plate. If we believe in Gauss’s law, then 
the solution is simple: the field is constant in both magnitude and 
direction, so although it tells us the direction of the nearest point 
on the plate, it tells us nothing about the distance to that point. 

But if the photon is massive, we expect fields to fall off more 
rapidly with distance than they would according to standard theory. 
In this example, the standard theory says that the field does not 
decrease at all with distance, so for a massive photon we expect that 
it does fall off. This will violate Gauss’s law, but we still expect that 
the distance to the plate will not be determinable by examination of 
a small region of space: if the field equations are linear, then a held 
with a given strength could be from a nearby capacitor plate with 
a small charge density, or a more distant one with more charge. 

If we’re willing to violate Gauss’s law, then we can have held lines 
simply terminate in empty space, h/3, and this will cause the held 
strength to decrease. As we traverse a small distance dx, moving 
away from the plate, some fraction of the held lines should termi- 
nate, leading to a corresponding fractional reduction d E/E in the 
held strength. The ratio ( dE/E)/dx must be constant, and this 
can only happen if we have E oc e - ^ 1 , where p is a constant with 
units of inverse length. (On the other side of the plate, where x is 
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negative, we have +fix inside the exponential.) For the reasons dis- 
cussed above, we actually expect that // equals mc/h multiplied by 
a unitless constant of order unity. In fact, it can be shown that the 
unitless constant is a factor of 2n, so n simply the mass, expressed 
in units where both c and h equal 1. 

Since the field of a capacitor plate is equal to the superposition 
of the fields of all the charges distributed uniformly on it, our result 
that the capacitor’s field falls off in a certain way tells us something 
corresponding about the field of a point charge. We expect that the 
field of a point charge q is 


e~^ r 

E = 

where Coulomb’s law is recovered in the case /i = 0. This form 
was originally inferred by Yukawa for nuclear forces, which really do 
have a finite range. 

We now have an extraordinarily sensitive way of placing a limit 
on the masses of the photon and graviton. Even if fx is very small, we 
can make observations on very large distance scales, and static forces 
should fall off exponentially. In the case of gravitational forces, 
we observe that gravity does operate, with no detectable Yukawa- 
style attenuation, on scales comparable to the size of the observable 
universe, on the order of billions of light-years. This corresponds 
to a limit on the mass of the graviton of ~ 10~ 69 kg — surely the 
smallest mass scale that has ever been probed by human beings! 
Measurements of the magnetic field of Jupiter by the Pioneer 10 
space probe limit the mass of the photon to no more than about 
8 x 10~ 52 kg, which is almost as impressive. 

Although today’s tightest bounds are from solar-system and cos- 
mological measurements, historically some very precise tabletop ex- 
periments were carried out. Laboratory experiments are always de- 
sirable in such cases because the conditions can be controlled, and 
the experiments can be replicated. Problem 21 on p. 115 is an anal- 
ysis of such an experiment. 

4.3.6 No global conservation of energy-momentum in general 
relativity 

If you read optional chapter 2, you know that the distinction 
between special and general relativity is defined by the flatness 
of spacetime, and that flatness is in turn defined by the path- 
independence of parallel transport. Whereas energy is a scalar in 
Newtonian mechanics, in relativity it is the timelike component of 
a vector. It therefore follows that in general relativity we should 
not expect to have global conservation of energy. For a conservation 
law is a statement that when we add up a certain quantity, the total 
has a constant value. But if spacetime is curved, then there is no 
natural, uniquely defined way to compare vectors that are defined at 
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different places in spacetime. We could parallel transport one over 
to the other, but the result would depend on the path along which 
we chose to transport it. For similar reasons, we should not expect 
global conservation of momentum. 

This is the answer to a frequently asked question about cosmol- 
ogy. Since 1998 we’ve known that the expansion of the universe is 
accelerating, rather than decelerating as we would have expected due 
to gravitational attraction. What is the source of the ever-increasing 
kinetic energy of all those galaxies? The question assumes that en- 
ergy must be conserved on cosmological scales, but that just isn’t 
so. 

Nevertheless, general relativity reduces to special relativity on 
scales small enough to make curvature effects negligible. Therefore it 
is still valid to expect conservation of energy and momentum to hold 
locally , as assumed, e.g., in the analysis of Compton scattering in 
example 9 on p. 91, and verified in countless experiments. Cf. section 
9.2, p. 179, on the stress-energy tensor. 

4.4 * Systems with internal structure 

Section 4.2 presented essentially Einstein’s original proof of E = 
me 2 , which has been criticized on several grounds. A detailed discus- 
sion is given by Ohanian. 8 Putting aside questions that are purely 
historical or concerned only with academic priority, we would like 
to know whether the proof has logical flaws, and also whether the 
claimed result is only valid under certain conditions. We need to 
consider the following questions: 


1. Does it matter whether the system being described has finite 
spatial extent, or whether the system is isolated? 



i/The world lines of two beads 
bouncing back and forth on a 
wire. 


2. Does it matter whether parts of the system are moving at 
relativistic velocities? 

3. Does the low-velocity approximation used in Einstein’s proof 
make a difference? 

4. How do we handle a system that is not made out of point- 
like particles, e.g., a capacitor, in which some of the energy- 
momentum is in an electric field? 


The following example demonstrates issues 1-3 and their logical 
connections; the definitional question 4 is addressed in ch. 9. Sup- 
pose that two beads slide freely on a wire, bouncing elastically off 
of each other and also rebounding elastically from the wire’s ends. 
Their world-lines are shown in figure i. Let’s say the beads each 

8 “Einstein’s E = me 2 mistakes,” arxiv . org/abs/0805 . 1400 
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have unit mass. In frame o, the beads are released from the center 
of the wire with velocities ±u. For concreteness, let’s set u = 1/2, 
so that the system has internal motion at relativistic speeds. In 
this frame, the total energy-momentum vector of the system, on the 
surface of simultaneity labeled 1 in figure i, is p = (2.31, 0). That is, 
it has a total mass-energy of 2.31 units, and a total momentum of 
zero (meaning that this is the center of mass frame). As time goes 
on, an observer in this frame will say that the balls reach the ends of 
the wire simultaneously, at which point they rebound, maintaining 
the same total energy-momentum vector p. The mass of the system 
is, by definition, m = \Jv\ ~ = a/2731 , and this mass remains 

constant as the balls bounce back and forth. 

Now let’s transform into a frame o', moving at a velocity v = 1/2 
relative to o. If velocities added linearly in relativity, then the initial 
velocities of the beads in this frame would be 0 and — 1, but of course 
a material object can’t move with speed |n| = c = 1, and velocities 
don’t add linearly. Applying the correct velocity addition formula for 
relativity, we find that the beads have initial velocities 0 and —0.8 in 
this frame, and if we compute their total energy-momentum vector, 
on surface of simultaneity 2 in figure i, we get p' = (2.67,-1.33). 
This is exactly what we would have gotten by taking the original 
vector p and pushing it through a Lorentz transformation. That is, 
the energy-momentum vector seems to be acting like a good four- 
vector, even though the system has finite spatial extent and contains 
parts that move at relativistic speeds. In particular, this implies that 
the system has the same mass m = /2.31 as in o, since m is the 
norm of the p vector, and the norm of a vector stays the same under 
a Lorentz transformation. 

But now consider surface 3, which, like 2, observer o' considers 
to be a surface of simultaneity. At this time, o' says that both beads 
are moving to the left. Between time 2 and time 3, o' says that 
the system’s total momentum has changed, while its total mass- 
energy stayed constant. Its mass is different, and the total energy- 
momentum vector p' at time 3 is not related by a Lorentz trans- 
formation to the value of p at any time in frame o. The reason for 
this misbehavior is that the right-hand bead has bounced off of the 
right end of the wire, but because o and o' have different opinions 
about simultaneity, o' says that there has not yet been any matching 
collision for the bead on the left. 

But all of these difficulties arise only because we have left some- 
thing out. When the right-hand bead bounces off of the right-hand 
end of the wire, this is a collision between the bead and the wire. 
After the collision, the wire rebounds to the right (or a vibration is 
created in it). By ignoring the rebound of the wire, we have vio- 
lated the law of conservation of momentum. If we take into account 
the momentum imparted to the wire, then the energy-momentum 
vector of the whole system is conserved, and must therefore be the 
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same at 2 and 3. 


The upshot of all this is that E = me 2 and the four-vector 
nature of p are both valid for systems with finite spatial extent, 
provided that the systems are isolated. “Isolated” means simply 
that we should not gratuitously ignore anything such as the wire 
in this example that exchanges energy-momentum with our system. 
To give a general proof of this, it will be helpful to develop the 
idea of the stress-energy tensor (section 9.2, p. 179), which allows 
a succinct statement of what we mean by conservation of energy- 
momentum (subsection 9.2.1). A proof is given in section 9.3.4 on 
p. 191. 


4.5 a Force 

Force is a concept that is seldom needed in relativity, and that’s 
why this section is optional. 

4.5.1 Four-force 

By analogy with Newtonian mechanics, we define a relativistic 
force vector 

F = ma, 

where a is the acceleration four-vector (sec 3.5, p. 60) and m is the 
mass of a particle that has that acceleration as a result of the force 
F. This is equivalent to 

'-t 

where p is the mass of the particle and r its proper time. Since 
the timelike part of p is the particle’s mass-energy, the timelike 
component of the force is related to the power expended by the 
force. These definitions only work for massive particles, since for a 
massless particle we can’t define a or r. F has been defined in terms 
of Lorentz invariants and four-vectors, and therefore it transforms 
as a god-fearing four- vector itself. 

4.5.2 The force measured by an observer 

The trouble with all this is that F isn’t what we actually mea- 
sure when we measure a force, except if we happen to be in a frame 
of reference that momentarily coincides with the rest frame of the 
particle. As with velocity and acceleration (section 3.7, p. 66), we 
have a four- vector that has simple, standard transformation prop- 
erties, but a different F 0 , which is what is actually measured by the 
observer o. It’s defined as 

F = ** 

° d t ’ 

with a df in the denominator rather than a dr. In other words, 
it measures the rate of transfer of momentum according to the ob- 
server, whose time coordinate is t, not r — unless the observer hap- 
pens to be moving along with the particle. Unlike the three-vectors 
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v Q and a Q , whose timelike components are zero by definition ac- 
cording to observer o, F 0 usually has a nonvanishing timelike com- 
ponent, which is the rate of change of the particle’s mass-energy, 
i.e., the power. We can refer to the spacelike part of F 0 as the 
three- force. 


The following two examples show that an object moving at rel- 
ativistic speeds has less inertia in the transverse direction than in 
the longitudinal one. A corollary is that the three-acceleration need 
not be parallel to the three-force. 


Circular motion Example 13 

For a particle in uniform circular motion, y is constant, and we 
have 


F 0 = 


_d 

df 


(myv) = my 


dv 
df ‘ 


The particle’s mass-energy is constant, so the timelike compo- 
nent of F 0 does happen to be zero in this example. In terms of 
the three-vectors v 0 and a 0 defined in section 3.7, we have 


dv 0 

F° = my-^- = mya 0 , 


which is greater than the Newtonian value by the factor y. As a 
practical example, in a cathode ray tube (CRT) such as the tube in 
an old-fasioned oscilloscope or television, a beam of electrons is 
accelerated up to relativistic speed (problem 2, p. 111). To paint a 
picture on the screen, the beam has to be steered by transverse 
forces, and since the deflection angles are small, the world-line 
of the beam is approximately that of uniform circular motion. The 
force required to deflect the beam is greater by a factor of y than 
would have been expected according to Newton’s laws. 

Linear motion Example 14 

For accelerated linear motion in the x direction, ignoring y and z, 
we have a velocity vector 

dr 


whose x component is y v. Then 


0,X 


m 


d(yv) 

df 


dy 

= m-Ev + my 
df r 


dv 

d7 


dy dv 

= m— — + mya 
dv df 


= m(v 2 y 3 a + ya) 
= may 3 


The particle’s apparent inertia is increased by a factor of y 3 due 
to relativity. 
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The results of examples 13 and 14 can be combined as follows: 
F 0 = m7a 0 ,_L + nry 3 a 0) ||, 

where the subscripts _L and || refer to the parts of a G perpendicular 
and parallel to v Q . 

4.5.3 Transformation of the force measured by an observer 

Define a frame of reference o for the inertial frame of reference of 
an observer who does happen to be moving along with the particle 
at a particular instant in time. Then t is the same as r, and F c the 
same as F. In this frame, the particle is momentarily at rest, so the 
work being done on it vanishes, and the timelike components of F 0 
and F are both zero. 

Suppose we do a Lorentz transformation from o to a new frame 
o', and suppose the boost is parallel to F 0 and F (which are both 
purely spatial in frame o). Call this direction x. Then dp = 
(dpt,dp x ) = ( 0 , dp x ) transforms to dp' = (— jv dp x , 7 dp x ), so that 
F 0 i x = dp' x / dt' = ('y dp x ) / ('y dt) = F 0tX . The two factors of 7 
cancel, and we find that F q i x = F 0>x . 

Now let’s do the case where the boost is in the y direction, per- 
pendicular to the force. The Lorentz transformation doesn’t change 
dp y , so F 0 i y = dp'y/ dt' = dpy/ip/dt) = F 0 ^ y / 7 . 

The summary of our results is as follows. Let F 0 be the force 
acting on a particle, as measured in a frame instantaneously comov- 
ing with the particle. Then in a frame of reference moving relative 
to this one, we have 

F 0 /|| = F 0 || and 

p _ ^°’ ± 

' o'. i. — , 

7 

where || indicates the direction parallel to the relative velocity of the 
two frames, and T a direction perpendicular to it. 

4.5.4 Work 

Consider the one-dimensional version of the three-force, F = 
dp/ dt. An advantantage of this quantity is that it allows us to use 
the Newtonian form of the (one-dimensional) work-kinetic energy 
relation d E / dx = F without correction. Proof: 

d E d E dp dt 

dx dp dt dx 

_ d EF 
dp v 

By implicit differentiation of the definition of mass, we find that 
d E / dp = p/E, and this in turn equals v by the identity proved in 
example 7, p. 90. This leads to the claimed result, which is valid for 
both massless and material particles. 
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4.6 ★ Two applications 

4.6.1 The Stefan-Boltzmann law 

In 1818, Dulong and Petit analyzed experimental data to find the 
empirical and totally incorrect law P oc exp[T / (13.5 K)], relating the 
temperature T of a body to the power it emits as electromagnetic 
radiation. (To see that it must be wrong, note that it doesn’t vanish 
at absolute zero.) It was accepted until 1884, when Boltzmann 
corrected a systematic error in their analysis of the data, and offered 
a theoretical argument for the correct law, P oc T 4 . This law is 
extremely important in a variety of applications including global 
warming, stellar structure, cosmology, and warming your hands by 
the glow of a fire. Modern physics students usually come across it as 
a corollary in the story of the development of the quantum theory 
by Planck, Einstein, et al, but as we will see below, it is a purely 
classical result, depending only on relativity and thermodynamics. 

Consider an insulated cubical box of volume V containing ra- 
diation in thermal equilibrium. We let it expand uniformly with 
constant entropy, so that all three sides grow by the same factor a. 
(This is exactly what happens in cosmological expansion.) Boltz- 
mann’s clever idea was that the radiation could be treated like the 
working fluid in a heat engine. 

By the relativistic relation between momentum and energy, the 
energy and momentum of a ray of light are equal (in natural units). 
Therefore if we lived in a one-dimensional world, the pressure p 
exerted by our radiation on the walls of its one-dimensional ves- 
sel would equal its energy density p. Because we live in a three- 
dimensional world, and the momenta along the three axes are in 
equilibrium, we have instead p = p/ 3. This is called the equation 
of state of the radiation. In cosmology, other components of the 
universe, such as galaxies, have equations of state with some factor 
other than 1/3 in front. 

As the box expands, the pressure of the radiation on the walls 
does work W. By conservation of energy, we have d U + d W = 0, 
where U is the energy of the radiation. Substituting U = pV and 
dW = pdV, we obtain d(pV) +pdV = 0. Applying the product 
rule, separating variables, and integrating, we find p oc a -4 . Here 
the exponent 4 is simply the number of spatial dimensions plus one. 
Exactly the same relation held in the early universe, which was 
dominated by radiation rather than matter. 

Because there is no heat transfer, the entropy is constant. En- 
tropy can be interpreted as a measure of the number of accessible 
states, and because state-counting doesn’t depend on scaling, the 
occupied modes of vibration stay the same. Thus, the wavelengths 
simply grow in proportion to a. (A more formal and rigorous ver- 
sion of this argument is called the adiabatic theorem, proved by 
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Born and Fock in 1928.) Although this is a classical argument, 
we can save some work at this point by appealing to quantum me- 
chanics for a shortcut. Since a photon has an energy 1/A, we have 
U oc 1/A oc 1/a. The temperature of the radiation is proportional 
to the average energy per degree of freedom, so we have T oc 1/a as 
well. 

Therefore p oc T 4 . This is equivalent to the Stefan-Boltzmann 
result, because light rays travel at the fixed speed c, and therefore 
the flux of radiation is proportional to the energy density. Even 
though this final proportionality is classical in nature, the value 
of the proportionality constant depends on Planck’s constant, and 
is quantum-mechanical. A derivation is given, for example, in the 
Feynman Lectures on Physics, section 1-41. 

4.6.2 Degenerate matter 

The properties of the momentum vector have surprising impli- 
cations for matter subject to extreme pressure, as in a star that 
uses up all its fuel for nuclear fusion and collapses. These implica- 
tions were initially considered too exotic to be taken seriously by 
astronomers. 

An ordinary, smallish star such as our own sun has enough hy- 
drogen to sustain fusion reactions for billions of years, maintaining 
an equilibrium between its gravity and the pressure of its gases. 
When the hydrogen is used up, it has to begin fusing heavier el- 
ements. This leads to a period of relatively rapid fluctuations in 
structure. Nuclear fusion proceeds up until the formation of ele- 
ments as heavy as oxygen (Z = 8), but the temperatures are not 
high enough to overcome the strong electrical repulsion of these nu- 
clei to create even heavier ones. Some matter is blown off, but finally 
nuclear reactions cease and the star collapses under the pull of its 
own gravity. 

To understand what happens in such a collapse, we have to un- 
derstand the behavior of gases under very high pressures. In gen- 
eral, a surface area A within a gas is subject to collisions in a time t 
from the n particles occupying the volume V = Avt , where v is the 
typical velocity of the particles. The resulting pressure is given by 
P ~ npv/V, where p is the typical momentum. 


Nondegenerate gas: In an ordinary gas such as air, the parti- 
cles are nonrelativistic, so v = p/m, and the thermal energy 
per particle is p 2 /2m ~ kT, so the pressure is P ~ nkT/V. 

Nonrelativistic, degenerate gas: When a fermionic gas is sub- 
ject to extreme pressure, the dominant effects creating pres- 
sure are quantum-mechanical. Because of the Pauli exclu- 
sion principle, the volume available to each particle is ~ V/n, 
so its wavelength is no more than ~ (V/n) 1 / 3 , leading to 
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p = h/X ~ htpi/V) 1 ^ 3 . If the speeds of the particles are still 
nonrelativistic, then v = p/m still holds, so the pressure be- 
comes P (h 2 /m)(n/V) 5 / 3 . 

Relativistic, degenerate gas: If the compression is strong enough 
to cause highly relativistic motion for the particles, then v ~ c, 
and the result is P hc{n/V ) A / 3 . 

As a star with the mass of our sun collapses, it reaches a point 
at which the electrons begin to behave as a degenerate gas, and 
the collapse stops. The resulting object is called a white dwarf. A 
white dwarf should be an extremely compact body, about the size 
of the Earth. Because of its small surface area, it should emit very 
little light. In 1910, before the theoretical predictions had been 
made, Russell, Pickering, and Fleming discovered that 40 Eridani B 
had these characteristics. Russell recalled: “I knew enough about 
it, even in these paleozoic days, to realize at once that there was 
an extreme inconsistency between what we would then have called 
‘possible’ values of the surface brightness and density. I must have 
shown that I was not only puzzled but crestfallen, at this exception 
to what looked like a very pretty rule of stellar characteristics; but 
Pickering smiled upon me, and said: ‘It is just these exceptions 
that lead to an advance in our knowledge,’ and so the white dwarfs 
entered the realm of study!” 

S. Chandrasekhar showed in that 1930’s that there was an upper 
limit to the mass of a white dwarf. We will recapitulate his calcu- 
lation briefly in condensed order-of-magnitude form. The pressure 
at the core of the star is P ~ pgr ~ GM 2 /r 4 , where M is the total 
mass of the star. The star contains roughly equal numbers of neu- 
trons, protons, and electrons, so M = Knm, where m is the mass of 
the electron, n is the number of electrons, and K ~ 4000. For stars 
near the limit, the electrons are relativistic. Setting the pressure at 
the core equal to the degeneracy pressure of a relativistic gas, we 
find that the Chandrasekhar limit is rs_/ (hc/G) 3 / 2 (Km)~ 2 = 6 M 0 . 
A less sloppy calculation gives something more like 1.4Mg. 

What happens to a star whose mass is above the Chandrasekhar 
limit? As nuclear fusion reactions flicker out, the core of the star be- 
comes a white dwarf, but once fusion ceases completely this cannot 
be an equilibrium state. Now consider the nuclear reactions 

n — > p + e~ + v 
p + e~ -» n + v, 

which happen due to the weak nuclear force. The first of these re- 
leases 0.8 MeV, and has a half-life of 14 minutes. This explains 
why free neutrons are not observed in significant numbers in our 
universe, e.g., in cosmic rays. The second reaction requires an input 
of 0.8 MeV of energy, so a free hydrogen atom is stable. The white 



j / Subrahmanyan Chandrasekhar 
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dwarf contains fairly heavy nuclei, not individual protons, but sim- 
ilar considerations would seem to apply. A nucleus can absorb an 
electron and convert a proton into a neutron, and in this context the 
process is called electron capture. Ordinarily this process will only 
occur if the nucleus is neutron-deficient; once it reaches a neutron- 
to-proton ratio that optimizes its binding energy, neutron capture 
cannot proceed without a source of energy to make the reaction go. 
In the environment of a white dwarf, however, there is such a source. 
The annihilation of an electron opens up a hole in the “Fermi sea.” 
There is now an state into which another electron is allowed to drop 
without violating the exclusion principle, and the effect cascades 
upward. In a star with a mass above the Chandrasekhar limit, this 
process runs to completion, with every proton being converted into a 
neutron. The result is a neutron star, which is essentially an atomic 
nucleus (with Z = 0) with the mass of a star! 

Observational evidence for the existence of neutron stars came 
in 1967 with the detection by Bell and Hewish at Cambridge of a 
mysterious radio signal with a period of 1.3373011 seconds. The sig- 
nal’s observability was synchronized with the rotation of the earth 
relative to the stars, rather than with legal clock time or the earth’s 
rotation relative to the sun. This led to the conclusion that its origin 
was in space rather than on earth, and Bell and Hewish originally 
dubbed it LGM-1 for “little green men.” The discovery of a second 
signal, from a different direction in the sky, convinced them that it 
was not actually an artificial signal being generated by aliens. Bell 
published the observation as an appendix to her PhD thesis, and 
it was soon interpreted as a signal from a neutron star. Neutron 
stars can be highly magnetized, and because of this magnetization 
they may emit a directional beam of electromagnetic radiation that 
sweeps across the sky once per rotational period — the “lighthouse 
effect.” If the earth lies in the plane of the beam, a periodic signal 
can be detected, and the star is referred to as a pulsar. It is fairly 
easy to see that the short period of rotation makes it difficult to 
explain a pulsar as any kind of less exotic rotating object. In the 
approximation of Newtonian mechanics, a spherical body of density 
p , rotating with a period T = y/ Sir/Gp , has zero apparent gravity 
at its equator, since gravity is just strong enough to accelerate an 
object so that it follows a circular trajectory above a fixed point on 
the surface (problem 17). In reality, astronomical bodies of plane- 
tary size and greater are held together by their own gravity, so we 
have T > 1 /y/Gp for any body that does not fly apart spontaneously 
due to its own rotation. In the case of the Bell-Hewish pulsar, this 
implies p > 10 10 kg/rn 3 , which is far larger than the density of nor- 
mal matter, and also 10-100 times greater than the typical density 
of a white dwarf near the Chandrasekhar limit. 

An upper limit on the mass of a neutron star can be found in a 
manner entirely analogous to the calculation of the Chandrasekhar 
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limit. The only difference is that the mass of a neutron is much 
greater than the mass of an electron, and the neutrons are the only 
particles present, so there is no factor of K. Assuming the more 
precise result of 1.4Mq for the Chandrasekhar limit rather than 
our sloppy one, and ignoring the interaction of the neutrons via the 
strong nuclear force, we can infer an upper limit on the mass of a 
neutron star: 

1.4Mq ~ 5Mq 

V rnn J 

The theoretical uncertainties in such an estimate are fairly large. 
Tolrnan, Oppenheimer, and Volkoff originally estimated it in 1939 
as 0.7 Mq, whereas modern estimates are more in the range of 1.5 
to 3Mq. These are significantly lower than our crude estimate of 
5 Mq, mainly because the attractive nature of the strong nuclear 
force tends to pull the star toward collapse. Unambiguous results 
are presently impossible because of uncertainties in extrapolating 
the behavior of the strong force from the regime of ordinary nuclei, 
where it has been relatively well parametrized, into the exotic envi- 
ronment of a neutron star, where the density is significantly different 
and no protons are present. There are a variety of effects that may 
be difficult to anticipate or to calculate. For example, Brown and 
Bethe found in 1994 9 that it might be possible for the mass limit to 
be drastically revised because of the process e~ -A K ~ +v e , which is 
impossible in free space due to conservation of energy, but might be 
possible in a neutron star. Observationally, nearly all neutron stars 
seem to lie in a surprisingly small range of mass, between 1.3 and 
1A5Mq, but in 2010 a neutron star with a mass of 1.97 ± .04 Mq 
was discovered, ruling out most neutron-star models that included 
exotic matter. 10 

For stars with masses above the Tolrnan- Oppenheimer- Volkoff 
limit, it seems likely, both on theoretical and observational grounds, 
we end up with a black hole: an object with an event horizon 
(cf. p. 62) that cuts its interior off from the rest of the universe. 

4.7 a Tachyons and FTL 

4.7.1 A defense in depth 

Let’s summarize some ideas about faster-than-light (FTL, su- 
perluminal) motion in relativity: 


1 . Superluminal transmission of information would violate causal- 
ity, since it would allow a causal relationship between events 

9 H.A. Bethe and G.E. Brown, “Observational constraints on the maximum 
neutron star mass,” Astrophys. J. 445 (1995) L129. G.E. Brown and H.A. 
Bethe, “A Scenario for a Large Number of Low-Mass Black Holes in the Galaxy,” 
Astrophys. J. 423 (1994) 659. Both papers are available at adsabs .harvard.edu. 
10 Demorest et ah, arxiv.org/abs/1010.5788vl. 
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that were spacelike in relation to one another, and the time- 
ordering of such events is different according to different ob- 
servers. Since we never seem to observe causality to be vi- 
olated, we suspect that superluminal transmission of infor- 
mation is impossible. This leads us to interpret the metric in 
relativity as being fundamentally a statement of possible cause 
and effect relationships between events. 

2. We observe the invariant mass defined by m 2 = E 2 — p 2 to be 
a fixed property of all objects. Therefore we suspect that it is 
not possible for an object to change from having \E\ > \p\ to 
having \E\ < \p\. 

3. No continuous process of acceleration can bring an observer 
from v < c to v > c (see section 3.3). Since it’s possible to 
build an observer out of material objects, it seems that it’s 
impossible to get a material object past c by a continuous 
process of acceleration. 

4. If superluminal motion were possible, then one might also ex- 
pect superluminal observers to be possible. But FTL frames 
of reference are kinematically impossible in 3 + 1 dimensions 
(section 3.8, p. 69). 

Thus special relativity seems to have a defense in depth against 
superluminal motion. 

Based on 2, FTL motion would be a property of an exotic form 
of matter built out of hypothetical particles with imaginary mass. 
Such particles are called tachyons. An imaginary mass is not absurd 
on its face, because experiments directly measure E and p, not m. 
E.g., if we put a tachyon on a scale and weighed it, we would be 
measuring its mass-energy E. 

The weakest of these arguments is 1, since as described in sec- 
tion 2.1, we have no strong reasons for believing in causality as an 
overarching principle of physics. It would be exciting if we could 
detect tachyons in particle accelerator experiments or as naturally 
occurring radiation. Perhaps we could even learn to transmit and 
receive tachyon signals artificially, allowing us to send ourselves mes- 
sages from the future! This possibility was pointed out in 1917 by 
Tolman 11 and is referred to as the “tachyonic antitelephone.” 12 

n www . archive . org/details/theoryrelativmotOOtolmrich 
12 Bilaniuk et al. claimed in a 1962 paper to have found a reinterpretation 
that eliminated the causality violation, but their interpretation requires that 
rates of tachyon emission in one frame be related to rates of tachyon absorption 
in another frame, which in my opinion is equally problematic, since rates of 
absorption should depend on the environment, whereas rates of emission should 
depend on the emitter; the causality violation has simply been described in 
different words, but not eliminated. For a different critique, see Benford, Book, 
and Newcomb, “The tachyonic antitelephone,” Physical Review D 2 (1970) 263. 
Scans of the paper can be found online. 
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If we’re willing to let go of causality, then we just need to make 
sure that our tachyons comply with items 3 and 4 above. Argument 
4 tells us that the laws of physics must conspire to make it impossible 
to build an observer out of tachyons; this is not entirely implausible, 
since there are other classes of particles such as photons that can’t 
be used to construct observers. 

4.7.2 Experiments to search for tachyons 

Experimental searches are made more difficult by conflicting the- 
oretical claims as to whether tachyons should be charged or neu- 
tral, whether they should have integral or half-integral spin, and 
whether the normal spin-statistics relation even applies to them. 13 
If charged, it is uncertain whether and under what circumstances 
they would emit Cerenkov radiation. 

The most obvious experimental signature of tachyons would be 
propagation at speeds greater than c. Negative results were reported 
by Murthy and later by Clay, 14 who studied air showers generated 
by cosmic rays to look for precursor particles that arrived before the 
first photons. 

One could also look for particles with \p\ > \E\. Alvager and 
Errnan, in a 1965 experiment, studied the beta decay of 170 Tm, using 
a spectrometer to measure the momentum of charged radiation and 
a solid state detector to determine energy. An upper limit of one 
tachyon per 10 4 beta particles was inferred. 

If tachyons are neutral, then they might be difficult to detect 
directly, but it might be possible to infer their existence indirectly 
through missing energy-momentum in reactions. This is how the 
neutrino was first discovered. Baltay et a/. 15 searched for reactions 
such as p + p — > 7r + + 7r~ + t, with t being a neutral tachyon, by 
measuring the momenta of all the other initial and final particles 
and looking for events in which the missing energy-momentum was 
spacelike. They put upper limits of ~ 10~ 3 on the branching ratios 
of this and several other reactions leading to production of single 
tachyons or tachyon-antitachyon pairs. 

For a long time after the discovery of the neutrino, very little 
was known about its mass, so it was consistent with the experimen- 
tal evidence to imagine that one or more species of neutrinos were 
tachyons, and Chodos et al. made such speculations in 1985. A 
brief flurry of reawakened interest in tachyons was occasioned by 
a 2011 debacle in which the particle-physics experiment OPERA 
mistakenly reported faster-than-light propagation of neutrinos; the 

13 Feinberg, “Possibility of Faster-Than-light Particles,” Phys 
Rev 159 (1967) 1089, http://www.scribd.com/doc/144943457/ 

G-Feinberg- Possibility- of -Fast er-Than- light -Part icles-Phys-Rev- 159-1967- 1089 
14 “A search for tachyons in cosmic ray showers,” Austr. J. Phys 41 (1988) 93, 
http://adsabs.harvard.edu/full/1988AuJPh. .41. . .93C 
15 Phys. Rev. D 1 (1970) 759 


Section 4.7 * Tachyons and FTL 


109 



anomaly was later found to be the result of a loose connection on 
a fiber-optic cable plus a miscalibrated oscillator. An experiment 
called KATRIN, currently nearing the start of operation at Karl- 
sruhe, will provide the first direct measurement of the mass of the 
neutrino, by measuring very precisely the maximum energy of the 
electrons emitted in the decay of tritium, 3 H — > 3 He + e~ + u e . 
Conservation of energy then allows one to determine the minimum 
energy of the antineutrino, which is related to its mass and momen- 
tum by m 2 = E 2 — p 2 . Because m 2 appears in this equation, the 
experiment really measures m 2 , not m, and a result of m 2 < 0 would 
bring the tachyonic neutrino back from the grave. 

4.7.3 Tachyons and quantum mechanics 

When we add quantum mechanics to special relativity, we get 
quantum field theory, which sounds scary and can be quite technical, 
but is governed by some very simple principles. One of these prin- 
ciples is that “everything not forbidden is compulsory.” The phrase 
was popularized as a political satire of communism by T.H. White, 
but was commandeered by physicist Murray Gell-Mann to express 
the idea that any process not forbidden by a conservation law will 
in fact occur in nature at some rate. If tachyons exist, then it is 
possible to have two tachyons whose energy-momentum vectors add 
up to zero (problem 8, p. 112). This would seem to imply that 
the vacuum could spontaneously create tachyon-antitachyon pairs. 
Most theorists now interpret this as meaning that when tachyons 
pop up in the equations, it’s a sign that the assumed vacuum state 
is not stable, and will change into some other state that is the true 
state of minimum energy. 
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Problems 

1 Criticize the following reasoning. Temperature is a measure 
of the energy per atom. In nonrelativistic physics, there is a min- 
imum temperature, which corresponds to zero energy per atom, but 
no maximum. In relativity, there should be a maximum temperature, 
which would be the temperature at which all the atoms are moving 
at c. 

2 In an old-fashioned cathode ray tube (CRT) television, elec- 
trons are accelerated through a voltage difference that is typically 
about 20 kV. At what fraction of the speed of light are the electrons 
moving? 

3 In nuclear beta decay, an electron or antielectron is typically 
emitted with an energy on the order of 1 MeV. In alpha decay, 
the alpha particle typically has an energy of about 5 MeV. In each 
case, do a rough estimate of whether the particle is nonrelativistic, 
relativistic, or ultrarelativistic. 

4 Suppose that the starship Enterprise from Star Trek has a 

mass of 8.0 x 10 7 kg, about the same as the Queen Elizabeth 2. 
Compute the kinetic energy it would have to have if it was moving 
at half the speed of light. Compare with the total energy content of 
the world’s nuclear arsenals, which is about 10 21 J. V 

5 Cosmic-ray neutrinos may be the fastest material particles in 
the universe. In 2013 the IceCube neutrino detector in Antarctica 
detected two neutrinos, 16 dubbed Bert and Ernie, after the Sesame 
Street characters, with energies in the neighborhood of 1 PeV = 
10 15 eV. The higher energy was Ernie’s 1.14 ± 0.17 PeV. It is not 
known what type of neutrino he was, nor do we have exact masses 
for neutrinos, but let’s assume m = 1 eV. Find Ernie’s rapidity. 

6 Science fiction stories often depict spaceships traveling through 
solar systems at relativistic speeds. Interplanetary space contains 
a significant number of tiny dust particles, and such a ship would 
sweep these dust particles out of a large volume of space, impacting 
them at high speeds. A 1975 experiment aboard the Skylab space 
station measured the frequency of impacts from such objects and 
found that a square meter of exposed surface experienced an impact 
from a particle with a mass of ~ ICE 15 kg about every few hours. 
A relativistic object, sweeping through space much more rapidly, 
would experience such impacts at rates of more like one every few 
seconds. (Larger particles are significantly more rare, with the fre- 
quency falling off as something like nE 8 .) These particles didn’t 
damage Skylab, because at relative velocities of ~ 10 4 m/s their ki- 
netic energies were on the order of microjoules. At relativistic speeds 
it would be a different story. Real-world spacecraft are lightweight 
and rather fragile, so there would probably be serious consequences 

16 arxiv . org/ abs/1304 . 5356 
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from any impact having a kinetic energy of about 10 2 J (comparable 
to a bullet from a small handgun), (a) Find the speed at which a 
starship could cruise through a solar system if frequent 10 2 J col- 
lisions were acceptable, assuming no object with a mass of more 
than 1CT 15 kg. Express your result relative to c. (b) Find the speed 
under the more conservative parameters of 10 J and 10 -14 kg. 

7 Example 4 on p. 61 derives the equation 

x = - cosh ar 
a 

for a particle moving with constant acceleration. (Note that a con- 
stant of integration was taken to be zero, so that i / 0 at r = 0.) 
(a) Rewrite this equation in metric units by inserting the necessary 
factors of c. (b) If we had a rocket ship capable of accelerating 
indefinitely at g, how much proper time would be needed in order 
to travel the distance Ax = 27, 000 light-years to the galactic cen- 
ter? (This will be a flyby, so the ship accelerates all the way rather 
than decelerating to stop at its destination.) Answer: 11 years (c) 
An observer at rest relative to the galaxy explains the surprisingly 
short time calculated in part b as being due to the time dilation 
experienced by the traveler. How does the traveler explain it? 

8 Show, as claimed on p. 110, that if tachyons exist, then it is 
possible to have two tachyons whose momentum vectors add up to 
zero. 

9 (a) A free neutron (as opposed to a neutron bound into an 
atomic nucleus) is unstable, and undergoes spontaneous radioactive 
decay into a proton, an electron, and an antineutrino. The masses 
of the particles involved are as follows: 


neutron 

1.67495 x 10- 27 kg 

proton 

1.67265 x 1(T 27 kg 

electron 

0.00091 x 10 - 27 kg 

antineutrino 

< 10~ 35 kg 


Find the energy released in the decay of a free neutron. V 

(b) Neutrons and protons make up essentially all of the mass of the 
ordinary matter around us. We observe that the universe around us 
has no free neutrons, but lots of free protons (the nuclei of hydrogen, 
which is the element that 90% of the universe is made of). We find 
neutrons only inside nuclei along with other neutrons and protons, 
not on their own. 

If there are processes that can convert neutrons into protons, we 
might imagine that there could also be proton-to-neutron conver- 
sions, and indeed such a process does occur sometimes in nuclei 
that contain both neutrons and protons: a proton can decay into a 
neutron, a positron, and a neutrino. A positron is a particle with 
the same properties as an electron, except that its electrical charge 
is positive. A neutrino, like an antineutrino, has negligible mass. 
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Although such a process can occur within a nucleus, explain why 
it cannot happen to a free proton. (If it could, hydrogen would be 
radioactive, and you wouldn’t exist!) 

10 (a) Find a relativistic equation for the velocity of an object 

in terms of its mass and momentum (eliminating 7). V 

(b) Show that your result is approximately the same as the classical 
value, p/m, at low velocities. 

(c) Show that very large momenta result in speeds close to the speed 
of light. 

11 Expand the equation for relativistic kinetic energy K = 
m{j — 1 ) in a Taylor series, and find the first two nonvanishing 
terms. Show that the first term is the nonrelativistic expression. 

12 Expand the equation p = m'yv in a Taylor series, and find 
the first two nonvanishing terms. Show that the first term is the 
classical expression. 

13 An atom in an excited state emits a photon, ending up in 
a lower state. The initial state has mass m 1, the final one m2. To 
a very good approximation, we expect the energy E of the photon 
to equal m\ — m2. However, conservation of momentum dictates 
that the atom must recoil from the emission, and therefore it carries 
away a small amount of kinetic energy that is not available to the 
photon. Find the exact energy of the photon, in the frame in which 
the atom was initially at rest. 

14 The following are the three most common ways in which 
gamma rays interact with matter: 

Photoelectric effect: The gamma ray hits an electron, is annihilated, 
and gives all of its energy to the electron. 

Compton scattering: The gamma ray bounces off of an electron, 
exiting in some direction with some amount of energy. 

Pair production: The gamma ray is annihilated, creating an electron 
and a positron. 

Example 10 on p. 92 shows that pair production can’t occur in a 
vacuum due to conservation of the energy-momentum four-vector. 
What about the other two processes? Can the photoelectric effect 
occur without the presence of some third particle such as an atomic 
nucleus? Can Compton scattering happen without a third particle? 
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15 This problem assumes you know some basic quantum physics. 
The point of this problem is to estimate whether or not a neutron or 
proton in an an atomic nucleus is highly relativistic. Nuclei typically 
have diameters of a few fm (1 fm = 10~ 15 m). Take a neutron or 
proton to be a particle in a box of this size. In the ground state, 
half a wavelength would fit in the box. Use the de Broglie relation 
to estimate its typical momentum and thus its typical speed. How 
relativistic is it? 

16 Show, as claimed in example 11 on p. 94, that if a massless 
particle were to decay, Lorentz invariance requires that the time- 
scale r for the process be proportional to the particle’s energy. What 
units would the constant of proportionality have? 

17 Derive the equation T = \j3ir/Gp given on page 106 for the 
period of a rotating, spherical object that results in zero apparent 
gravity at its surface. 

18 Neutrinos with energies of ~ 1 MeV (the typical energy scale 
of nuclear physics) make up a significant part of the matter in our 
universe. If a neutrino and an antineutrino annihilate each other, 
the product is two back-to-back photons whose energies are equal 
in the center-of-mass frame. Should astronomers be able to detect 
these photons by selecting only those with the correct energy? 

19 In a certain frame of reference, a gamma ray with energy 
E\ is moving to the right, while a second gamma way with energy 
E ‘2 flies off to the left, (a) Find the mass of the system, (b) Find 
the velocity of the center-of-mass frame, i.e., the frame of reference 
in which the total momentum is zero. 

20 In section 4.5.4 we proved the work-energy relation d E / dx = 
F in the context of relativity. Recapitulate the derivation in the 
context of pure Newtonian mechanics. 
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21 Section 4.3.5 on p. 94 discusses the possibility that the 
photon has a small but nonzero mass m. One of the consequences 
is that the electric field of an infinite, uniformly charged plane is 
E x = ±27rfe<r exp(— /x|x|), where k is the Coulomb constant, a is the 
charge per unit area, y = m.c/h, and x is the distance from the plane. 
When m = 0, we recover the result of standard electromagnetism. 
The purpose of this problem is to analyze a laboratory experiment 
that can put an upper bound on m. 

Consider a rectangular, hollow, conducting box with charge placed 
on it. If m = 0, then Gauss’s law holds, and the field inside is 
exactly zero. We now consider the possibility that m > 0. We make 
the box very thin in the x direction, with sides localed at x = ±a. 
We refer to these two sides as the “plates.” The box’s extent in the 
y and z directions is much greater than a, so that the density of 
charge a on each of the two plates is nearly constant as long as we 
stay away from the fringing fields at the edges. Consider a point 
located at x = b, with 0 < b < a, and far from the edges. Show that 
there is a nonvanishing interior field, which can be measured in this 
experiment by the fractional difference in electric potential 


YM ~ v W 

V(a) 



- b 2 ) + ..., 


where . . . indicates higher-order terms. 

Remark: The experiment is more practical when carried out us- 
ing a spherical geometry, since there are no fringing fields to worry 
about. The analysis comes out the same except that the factor of 
1/2 becomes 1/6. Experiments of this type were first carried out by 
Cavendish in 1722, and then with a seris of order-of-magnitude im- 
provements in precision by Plimpton in 1936 and Williams in 1971. 


22 Potassium 40 is the strongest source of naturally occurring 
beta radioactivity in our environment. It decays according to 

40 K -> 40 Ca + e~ + v. 

The energy released in the decay is 1.33 MeV. The energy is shared 
randomly among the products, subject to the constraint imposed by 
conservation of energy-momentum, which dictates that very little of 
the energy is carried by the recoiling calcium nucleus. Determine 
the maximum energy of the calcium, and compare with the typical 
energy of a chemical bond, which is a few eV. If the potassium is 
part of a molecule, do we expect the molecule to survive? Carry out 
the calculation first by assuming that the electron is ultrarelativistic, 
then without the approximation, and comment on the how good the 
approximation is. 
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Chapter 5 

Inertia (optional) 


5.1 What is inertial motion? 

On p. 47 I stated the following as an axiom of special relativity: 

P4. Inertial frames of reference exist. These are frames in 
which particles move at constant velocity if not subject to any 
forces. We can construct such a frame by using a particular 
particle, which is not subject to any forces, as a reference 
point. Inertial motion is modeled by vectors and parallelism. 

This is a typical modern restatement of Newton’s first law. It claims 
to define inertial frames and claims that they exist. 



a / The spherical chamber, shown 
in a cutaway view, has layers 
of shielding to exclude all known 
nongravitational forces. The three 
guns, at right angles to each 
other, fire bullets. Once the cham- 
ber has been calibrated by mark- 
ing the three dashed-line trajecto- 
ries under free-fall conditions, an 
observer inside the chamber can 
always tell whether she is in an in- 
ertial frame. 


5.1.1 An operational definition 

In keeping with the philosophy of operationalism (p. 26), we 
ought to be able to translate the definition into a method for testing 
whether a given frame really is inertial. Figure a shows an idealized 
varation on a device actually built for this purpose by Harold Waage 
at Princeton as a lecture demonstration to be used by his partner in 
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crime John Wheeler. We build a sealed chamber whose contents are 
isolated as much as possible from outside forces. Of the four known 
forces of nature, the ones we know how to exclude are the strong 
nuclear force, the weak nuclear force, and the electromagnetic force. 
The strong nuclear force has a range of only about 1 fin (10~ 15 m), 
so to exclude it we merely need to make the chamber thicker than 
that, and also surround it with enough paraffin wax to keep out 
any neutrons that happen to be flying by. The weak nuclear force 
also has a short range, and although shielding against neutrinos is 
a practical impossibility, their influence on the apparatus inside will 
be negligible. To shield against electromagnetic forces, we surround 
the chamber with a Faraday cage and a solid sheet of mu-metal. 
Finally, we make sure that the chamber is not being touched by any 
surrounding matter, so that short-range residual electrical forces 
(sticky forces, chemical bonds, etc.) are excluded. That is, the 
chamber cannot be supported; it is free-falling. 

Crucially, the shielding does not exclude gravitational forces. 
There is in fact no known way of shielding against gravitational 
effects such as the attraction of other masses or the propagation of 
gravitational waves. (Because the shielding is spherical, it exerts no 
gravitational force of its own on the apparatus inside.) 

Inside, an observer carries out an initial calibration by bring 
bullets along three Cartesian axes and tracing their paths, which she 
defines to be linear. (She can also make sure that the chamber isn’t 
rotating, e.g., by checking for velocity-dependent Coriolis forces.) 
After the initial calibration, she can always tell whether or not she 
is in an inertial frame. She simply has to fire the bullets, and see 
whether or not they follow the precalibrated paths. For example, 
she can detect that the frame has become noninertial if the chamber 
is rotated, allowed to rest on the ground, or accelerated by a rocket 
engine. 

Isaac Newton would have been extremely unhappy with our def- 
inition. “This is absurd,” he says. “The way you’ve defined it, my 
street in London isn’t inertial.” Newtonian mechanics only makes 
predictions if we input the correct data on all the mass in the uni- 
verse. Given this kind of knowledge, we can properly account for all 
the gravitational forces, and define the street in London as an iner- 
tial frame because in that frame, the trees and houses have zero total 
force on them and don’t accelerate. But spacetime isn’t Galilean. In 
special relativity’s description of spacetime, information propagates 
at a maximum speed of c, so there will always be distant parts of the 
universe that we can never know about, because information from 
those regions hasn’t had time to reach us yet. 

Rotation is noninertial Example 1 

Figure b shows a hypothetical example proposed by Einstein. 
One planet rotates about its axis and therefore has an equatorial 
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bulge. The other planet doesn’t rotate and has none. Both New- 
tonian mechanics and special relativity make these predictions, 
and although the scenario is idealized and unrealistic, there is no 
doubt that their predictions are correct for this situation, because 
the two theories have been tested in similar cases. This also 
agrees with our operational definition of inertial motion on p. 118. 
Rotational motion is noninertial. 

This bothered Einstein for the following reason. If the inhabitants 
of the two planets can look up in the sky at the “fixed stars,” they 
have a clear explanation of the reason for the difference in shape. 
People on planet A don’t see the stars rise or set, and they infer 
that this is because they live on a nonrotating world. The inhab- 
itants of planet B do see the stars rise and set, just as they do 
here on earth, so they infer, just as Copernicus did, that their 
planet rotates. 

But suppose, Einstein said, that the two planets exist alone in an 
otherwise empty universe. There are no stars. Then it’s equally 
valid for someone on either planet to say that it’s the one that 
doesn’t rotate. Each planet rotates relative to the other planet, 
but the situation now appears completely symmetric. Einstein 
took this argument seriously and felt that it showed a defect in 
special relativity. He hoped that his theory of general relativity 
would fix this problem, and predict that in an otherwise empty uni- 
verse, neither planet would show any tidal bulge. In reality, further 
study of the general theory of relativity showed that it made the 
same prediction as special relativity. Theorists have constructed 
other theories of gravity, most prominently the Brans-Dicke the- 
ory, that do behave more in the way Einstein’s physical intuition 
expected. Precise solar-system tests have, however, supported 
general relativity rather than Brans-Dicke gravity, so it appears 
well settled now that rotational motion really shouldn’t be consid- 
ered inertial. 

5.1.2 Equivalence of inertial and gravitational mass 

All of the reasoning above depends on the perfect cancellation 
referred to by Newton: since gravitational forces are proportional to 
mass, and acceleration is inversely proportional to mass, the result 
is that accelerations caused by gravity are independent of mass. 
This is the universality of free fall, which was famously observed by 
Galileo, figure c. 

Suppose that, on the contrary, we had access to some mat- 
ter that was immune to gravity. It’s sold under the brand name 
FloatyStuff™. The cancellation fails now. Let’s say that alien 
gangsters land in a flying saucer, kidnap you out of your back yard, 
konk you on the head, and take you away. When you regain con- 
sciousness, you’re locked up in a sealed cabin in their spaceship. 
You pull your keychain out of your pocket and release it, and you 



c / According to Galileo’s stu- 
dent Viviani, Galileo dropped 
a cannonball and a musketball 
simultaneously from the leaning 
tower of Pisa, and observed that 
they hit the ground at nearly the 
same time. This contradicted 
Aristotle’s long-accepted idea 
that heavier objects fell faster. 
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observe that it accelerates toward the floor with an acceleration that 
seems quite a bit slower than what you’re used to on earth, perhaps 
a third of a gee. There are two possible explanations for this. One 
is that the aliens have taken you to some other planet, maybe Mars, 
where the strength of gravity is a third of what we have on earth. 
The other is that your keychain didn’t really accelerate at all: you’re 
still inside the flying saucer, which is accelerating at a third of a gee, 
so that it was really the deck that accelerated up and hit the keys. 

There is absolutely no way to tell which of these two scenarios is 
actually the case — unless you happen to have a chunk of FloatyStuff 
in your other pocket. If you release the FloatyStuff and it hovers 
above the deck, then you’re on another planet and experiencing 
genuine gravity; your keychain responded to the gravity, but the 
FloatyStuff didn’t. But if you release the FloatyStuff and see it hit 
the deck, then the flying saucer is accelerating through outer space. 



d / Lorand Eotvos (1 848-1 91 9). 


5.2 The equivalence principle 

5.2.1 Equivalence of acceleration to a gravitational field 

The nonexistence of FloatyStuff in our universe is a special case 
of the equivalence principle. The equivalence principle states that 
an acceleration (such as the acceleration of the flying saucer) is al- 
ways equivalent to a gravitational field, and no observation can ever 
tell the difference without reference to something external. (And 
suppose you did have some external reference point — how would 
you know whether it was accelerating?) 

5.2.2 Eotvos experiments 

FloatyStuff would be an extreme example, but if there was any 
violation of the universality of free fall, no matter how small, then 
the equivalence principle would be falsified. Since Galileo’s time, ex- 
perimental methods have had several centuries in which to improve, 
and the second law has been subjected to similar tests with ex- 
ponentially improving precision. For such an experiment in 1993, 1 
physicists at the University of Pisa (!) built a metal disk out of 
copper and tungsten semicircles joined together at their flat edges. 
They evacuated the air from a vertical shaft and dropped the disk 
down it 142 times, using lasers to measure any tiny rotation that 
would result if the accelerations of the copper and tungsten were 
very slightly different. The results were statistically consistent with 
zero rotation, and put an upper limit of 1 x 1CU 9 on the fractional 
difference in acceleration (stopper — fftimgstenl/s- Experiments of this 
type are called Eotvos experiments, after Lorand Eotvos, who did 
the first modern, high-precision versions. 


1 Carusotto et al., “Limits on the violation of g - universality with a Galileo- 
type experiment,” Phys Lett A183 (1993) 355. Freely available online at re- 
searchgate.net. 
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The artificial horizon Example 2 

The pilot of an airplane cannot always easily tell which way is up. 
The horizon may not be level simply because the ground has an 
actual slope, and in any case the horizon may not be visible if the 
weather is foggy. One might imagine that the problem could be 
solved simply by hanging a pendulum and observing which way 
it pointed, but by the equivalence principle the pendulum cannot 
tell the difference between a gravitational field and an acceler- 
ation of the aircraft relative to the ground — nor can any other 
accelerometer, such as the pilot’s inner ear. For example, when 
the plane is turning to the right, accelerometers will be tricked into 
believing that “down” is down and to the left. To get around this 
problem, airplanes use a device called an artificial horizon, which 
is essentially a gyroscope. The gyroscope has to be initialized 
when the plane is known to be oriented in a horizontal plane. No 
gyroscope is perfect, so over time it will drift. For this reason the 
instrument also contains an accelerometer, and the gyroscope is 
automatically restored to agreement with the accelerometer, with 
a time-constant of several minutes. If the plane is flown in cir- 
cles for several minutes, the artificial horizon will be fooled into 
indicating that the wrong direction is vertical. 



e / An artificial horizon. 


5.2.3 Gravity without gravity 

We live immersed in the earth’s gravitational field, and that 
is where we do almost all of our physics experiments. It’s sur- 
prising, then, that special relativity can be confirmed in earth- 
bound experiments, sometimes with phenomonal precision, as in the 
Ives-Stilwell experiment’s 10-significant-figure test of the relativistic 
Doppler shift equation (p .56). How can this be, since special rel- 
ativity is supposed to be the version of relativity that can’t handle 
gravity? The equivalence principle provides an answer. If the only 
gravitational effect on your experiment is a uniform field g, then it’s 
valid for you to describe your experiment as having been done in a 
region without any gravity, but in a laboratory whose floor happened 
to have been accelerating upward with an acceleration — g. Special 
relativity works just fine in such situations, because switching into 
an accelerated frame of reference doesn’t have any effect on the flat- 
ness of spacetime (p. 46). Note that Gravity Probe B (p. 46) orbited 
the earth, so the field it experienced varied in direction, causing the 
above argument to fail; the effects it observed were not explainable 
by special relativity. 

5.2.4 Gravitational Doppler shifts 

For an example of a specifically gravitational experiment that 
is explainable by special relativity, and that has actually been car- 
ried out, In a laboratory accelerating upward, a light wave emitted 
from the floor would be Doppler-shifted toward lower frequencies 
when observed at the ceiling, because of the change in the receiver’s 
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f/1. A light wave is emitted 
upward from the floor of the ele- 
vator. The elevator accelerates 
upward. 2. By the time the light 
wave is detected at the ceiling, 
the elevator has changed its 
velocity, so the wave is detected 
with a Doppler shift. 



g / Pound and Rebka at the 
top and bottom of the tower. 


velocity during the wave’s time of flight. The effect is given by 
A/// « — aAx/c 2 , where a is the lab’s acceleration, Ax is the 
height from floor to ceiling, and c is the speed of light (problem 1). 
In units with c = 1, we have A/// ~ —a Ax. 

By the equivalence principle, we find that when such an experi- 
ment is done in a gravitational field <y, there should be a gravitational 
effect on frequency A/// ~ —gAx. This can be expressed more 
compactly as A/// ~ — A<f>, where is the gravitational potential, 
i.e., the gravitational energy per unit mass. 

In 1959, Pound and Rebka 2 carried out an experiment in a tower 
at Harvard. Gamma rays from were emitted by a 57 Fe source at the 
bottom and detected at the top, having risen Ax = 22.6 m. The 
equivalence principle predicts a fractional frequency shift due to 
gravity of 2.46 x 10 -15 . This is very small, and would normally have 
been masked by recoil effects (problem 13, p. 113), but by exploiting 
the Mossbauer effect Pound and Rebka measured the shift to be 
(2.56 ±0.25) x 10 -15 . 

5.2.5 A varying metric 

In the Pound- Rebka experiment, the nuclei emitting the gamma 
rays at frequency / can be thought of as little clocks. Each wave 
crest that propagates upward is a signal saying that the clock has 
ticked once. An observer at the top of the tower finds that the 
signals come in at the lower frequency f, and concludes naturally 
that the clocks at the bottom had been slowed down due to some 
kind of time dilation effect arising from gravity. 

This may seem like a big conceptual leap, but it has been con- 
firmed using atomic clocks. In a 1978 experiment by Iijima and Fu- 
jiwara, figure h, identical atomic clocks were kept at rest at the top 
and bottom of a mountain near Tokyo. The discrepancies between 
the clocks were consistent with the predictions of the equivalence 
principle. The gravitational Doppler shift was also one of the effects 
that led to the non-null result of the Hafele-Keating experiment 
p. 15, in which atomic clocks were flown around the world aboard 
commercial passenger jets. Every time you use the GPS system, 
you are making use of these effects. 

Starting from only the seemingly innocuous assumption of the 
equivalence principle, we are led to surprisingly far-reaching con- 
clusions. We find that time flows at different rates depending on 
the height within a gravitational field. Since the metric can be in- 
terpreted as a measure of the amount of proper time along a given 
world-line, we conclude that we cannot always express the metric in 
the familiar form r 2 = (±l)At 2 + (— l)Ax 2 with fixed coefficients 
±1 and —1. Suppose that the t coordinate is defined by radio syn- 
chronization. Then the ±1 in the metric needs to be replaced with 

2 Phys. Rev. Lett. 4 (1960) 337 
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h/A graph showing the time 
difference between two atomic 
clocks. One clock was kept at Mi- 
taka Observatory, at 58 m above 
sea level. The other was moved 
back and forth to a second ob- 
servatory, Norikura Corona Sta- 
tion, at the peak of the Norikura 
volcano, 2876 m above sea level. 
The plateaus on the graph are 
data from the periods when the 
clocks were compared side by 
side at Mitaka. The difference be- 
tween one plateau and the next 
shows a gravitational effect on the 
rate of flow of time, accumulated 
during the period when the mobile 
clock was at the top of Norikura. 


approximately 1 + 2<h, where we take $ = 0 by convention at the 
height of the standard clock that coordinates the synchronization. 

Keep in mind that although we have connected gravity to the 
measurement apparatus of special relativity, there is no curvature 
of spacetime, so what we are doing here is still special relativity, not 
general relativity. In fact there is nothing more mysterious going 
on here than a renaming of spacetime events through a change of 
coordinates. The renaming might be convenient if we were using 
earth-based reference points to measure the x coordinate. But if 
we felt like it, we could switch to a good inertial frame of reference, 
one that was free-falling. In this frame, we would obtain exactly the 
same prediction for the results of any experiment. For example, the 
free-falling observer would explain the result of the Pound-Rebka 
experiment as arising from the upward acceleration of the detector 
away from the source. 
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Problems 


1 Carry out the details of the calculation of the gravitational 
Doppler effect in section 5.2.4. 

2 A student argues as follows. At the center of the earth, there 
is zero gravity by symmetry. Therefore time would flow at the same 
rate there as at a large distance from the earth, where there is also 
zero gravity. Although we can’t actually send an atomic clock to the 
center of the earth, interpolating between the surface and the center 
shows that a clock at the bottom of a mineshaft would run faster 
than one on the earth’s surface. Find the mistake in this argument. 

3 Somewhere in outer space, suppose there is an astronomical 
body that is a sphere consisting of solid lead. Assume the Newtonian 
expression <F = —GM/r for the potential in the space outside the 
object. Make an order of magnitude estimate of the diameter it 
must have if the gravitational time dilation at its surface is to be 
a factor of 2 relative to time as measured far away. (Under these 
conditions of strong gravitational fields, special relativity is only a 
crude approximation, and that’s why we won’t get more than an 
order of magnitude estimate out of this.) What is the gravitational 
field at its surface? If I have a week’s vacation from work, and I 
spend it lounging on the beach on the lead planet, do I experience 
two weeks of relaxation, or half a week? 
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Chapter 6 

Waves 


This chapter and the preceding one have good, solid physical titles. 
Inertia. Waves. But underlying the physical content is a thread of 
mathematics designed to teach you a language for describing space- 
time. Without this language, the complications of relativity rapidly 
build up and become unmanageable. In section 5.2.5, we saw that 
there are physically compelling reasons for switching back and forth 
between different coordinate systems — different ways of attach- 
ing names to the events that make up spacetime. A toddler in a 
bilingual family gets a payoff for switching back and forth between 
asking Mama in Spanish for dulces and alerting Daddy in English 
that Barbie needs to be rescued from falling off the couch. She may 
bounce back and forth between the two languages in a single sen- 
tence — a habit that linguists call “code switching.” In relativity, 
we need to build fluency in a language that lets us talk about actual 
phenomena without getting hung up on the naming system. 

6.1 Frequency 

6.1 .1 Is time’s flow constant? 

The simplest naming task is in 0 + 1 dimensions: a time-line 
like the ones in history class. If we name the points in time A, B, 
C, . . . or 1, 2, 3, . . . , or Bush, Clinton, Bush, . . . , how do we know 
that we’re marking off equal time intervals? Does it make sense 
to imagine that time itself might speed up and slow down, or even 
start and stop? The second law of thermodynamics encourages us 
to think that it could. If the universe had existed for an infinite 
time, then entropy would have maximized itself — a long time ago, 
presumably — and we would not exist, because the heat death of 
the universe would already have happened. 

6.1.2 Clock-comparison experiments 

But what would it actually mean empirically for time’s rate of 
flow to vary? Unless we can tie this to the results of experiments, it’s 
nothing but cut-rate metaphysics. In a Hollywood movie where time 
could stop, the scriptwriters would show us the stopping through 
the eyes of an observer, who would stroll past frozen waterfalls and 
snapshotted bullets in mid-flight. The observer’s brain is a kind 
of clock, and so is the waterfall. We’re left with what’s known 
as a clock-comparison experiment. To date, all clock-comparison 
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experiments have given null results. Matsakis et al. 1 found that 
pulsars match the rates of atomic clocks with a drift of less than 
about 10” 6 seconds over 10 years. Guena et al. 2 observed that 
atomic clocks using atoms of different isotopes drifted relative to 
one another by no more than about 10“ 16 per year. Any non- null 
result would have caused serious problems for relativity. One of 
the expectations in an Aristotelian description of spacetime is that 
the motion of material objects on earth would naturally slow down 
relative to celestial phenomena such as the rising and setting of the 
sun. The relativistic interpretation of time dilation as an effect on 
time itself (p. 26) also depends crucially on the null results of these 
experiments. 

6.1.3 Birdtracks notation 

As a simple example of clock comparison, let’s imagine using 
the hourly emergence of a mechanical bird from a pendulum-driven 
cuckoo clock to measure the rate at which the earth spins. There 
is clearly a kind of symmetry here, since we could equally well take 
our planet’s rotation as the standard and use it to measure the 
frequency with which the bird pops out of the door. Schematically, 
let’s represent this measurement process with the following notation, 
which is part of a system called called birdtracks: 3 

c-*-e = 24 

Here c represents the cuckoo clock and e the rotation of the earth. 
Although the measurement relationship is nearly symmetric, the 
arrow has a direction, because, for example, the measurement of 
the earth’s rotational period in terms of the clock’s frequency is 
c-»e = (1 hr -1 ) (24 hr) = 24, but the clock’s period in terms of the 
earth’s frequency is e-+c = 1/24. We say that the relationship is 
not symmetric but “dual.” By the way, it doesn’t matter how we 
arrange these diagrams on the page. The notations c-*-e and e^-c 
mean exactly the same thing, and expressions like this can even be 
drawn vertically. 

Suppose that e is a displacement along some one-dimensional 
line of time, and we want to think of it as the thing being measured. 
Then we expect that the measurement process represented by c pro- 
duces a real-valued result and is a linear function of e. Since the 
relationship between c and e is dual, we expect that c also belongs 
to some vector space. For example, vector spaces allow multiplica- 
tion by a scalar: we could double the frequency of the cuckoo clock 

1 Astronomy and Astrophysics 326 (1997) 924, 

adsabs. harvard. edu/full/1997A&26A. . .326. .924M 

“arxiv . org/abs/ 1205 . 4235 

3 The system used in this book follows the one defined by Cvitanovic, which 
was based closely on a graphical notation due to Penrose. For a more com- 
plete exposition, see the Wikipedia article “Penrose graphical notation” and 
Cvitanovic’s online book at birdtracks . eu. 
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by making the bird come out on the half hour as well as on the 
hour, forming 2c. Measurement should be a linear function of both 
vectors; we say it is “bilinear.” 

6.1.4 Duality 

The two vectors c and e have different units, hr -1 and hr, and 
inhabit two different one-dimensional vector spaces. The “flavor” of 
the vector is represented by whether the arrow goes into it or comes 
out. Just as we used notation like it in freshman physics to tell 
vectors apart from scalars, we can employ arrows in the birdtracks 
notation as part of the notation for the vector, so that instead of 
writing the two vectors as c and e, we can notate them as c-*- and 
-*-e . Performing a measurement is like plumbing. We join the two 
“pipes” in c->- -»e and simplify to c-*-e . 

A confusing and nonstandardized jungle of notation and termi- 
nology has grown up around these concepts. For now, let’s refer to 
a vector such as -*-e , with the arrow coming in, simply as a “vec- 
tor,” and the type like c-»- as a “covector.” In the one-dimensional 
example of the earth and the cuckoo clock, the roles played by the 
two things were completely equivalent, and it didn’t matter which 
one we expressed as a vector and which as a covector. 

6.2 Phase 

6.2.1 Phase is a scalar 

In section 1.3.1, p. 22, we defined a (Lorentz) invariant as a 
quantity that was unchanged under rotations and Lorentz boosts. 

A measurement such as c-*-e = 24 is an invariant because it is simply 
a count. We’ve counted the number of periods. In fact, a count is 
not just invariant under rotations and boosts but under any well- 
behaved change of coordinates — the technical condition being that 
each coordinate in each set is a differentiable function of each co- 
ordinate in the other set. Such a change of coordinates is called 
a diffeomorphism. For example, a uniform scaling of the coordi- 
nates (t, x, y, z ) — > ( kt , kx, ky , kz), which is analogous to a change of 
units, is all right as long as k is nonzero. A quantity that stays the 
same under any diffeomorphism is called a scalar. Since a Lorentz 
transformation is a diffeomorphism, every scalar is a Lorentz invari- 
ant. Not every Lorentz invariant is a scalar. 

The determinant of the metric Example 1 

Minkowski coordinates can be defined as coordinates in which 
the metric has the standard form g = diag(1 , — 1 , — 1 , — 1 ). If we 
rescale these coordinates according to ( t , x, y, z) ->• (kt, kx, ky, kz), 
then the metric changes according to g ->■ k~ 2 g. To keep track of 


4 The appropriate relativistic way of defining a change of units is subject to 
some ambiguity. See section 9.6, p. 207. 
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how “un-Minkowski” this scaling is, we could use the determinant 
of the metric det(g) = -/c~ 8 . This determinant tells us how many 
coordinate-grid boxes fit in a unit volume, and it is of interest in 
a more general context than this example of uniform rescaling, 
e.g., it serves a similar function when converting from Cartesian 
coordinates to polar coordinates. 

Under a Lorentz transformation or a rotation, the metric retains 
its standard form, and therefore det (g) is Lorentz invariant. An- 
other way of seeing this is that spacetime volume is Lorentz in- 
variant (p. 49), so that a Lorentz transformation doesn’t change 
how many coordinate-grid boxes fit in a unit volume. 

But although det {g) is a Lorentz invariant, it is not a scalar, be- 
cause it changes under the transformation described above. 

In birdtracks notation, any expression that has no external ar- 
rows at all represents a scalar. Since the expression c-*-e = 24 has no 
external arrows, only internal ones, it represents a scalar. Another 
way of describing this measurement is as a phase. If we prefer to 
measure the phase (j) in units of cycles, then we have (j> = c-*~e. If we 
like radians, we can use 4> = 2irc--e. 

6.2.2 Scaling 

A convenient way of summarizing all of our categories of vari- 
ables is by their behavior when we rescale our coordinates. If we 
switch our time unit from hours to minutes, the number of ap- 
ples in a bowl is unchanged, the earth’s period of rotation gets 60 
times bigger, and the frequency of the cuckoo clock changes by a 
factor of 1/60. In other words, a quantity u under rescaling of co- 
ordinates by a factor a becomes a p u, where the exponents —1,0, 
and +1 correspond to covectors, scalars, and vectors, respectively. 
We can therefore see that these distinctions are of interest even in 
one dimension, contrary to what one would have expected from the 
freshman-physics concept of a vector as something transforming in 
a certain way under rotations. 

In section 1.3.1 (p. 22), we defined an invariant as a quantity 
that did not change under rotations or Lorentz boosts, i.e., one that 
was independent of the frame of reference. For a scalar we have 
the even more restrictive condition that it must not change under 
any change of coordinates. For example, area in 1 + 1-dimensional 
spacetime is an invariant, but it’s not a scalar; it changes when we 
rescale our coordinates. 

6.3 The frequency-wavenumber covector 

Generalizing from 0 + 1 dimensions to 3+1, we could have an 
observer moving inertially along velocity vector -»o, while counting 
the phase <f> (in radians) of a plane wave (perhaps a water wave or 
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an electromagnetic wave) that is washing over her. Since <f> is just a 
count, it’s clearly a scalar. That means that we have some function 
that takes as its input a vector ->-o and gives as an output the scalar 
4>. This function has all the right characteristics to be described 
as a measurement lj->~ o of -*-o with some covector cu-*-, and in a 
constructive style of mathematics this is a good way of defining a 
covector: it’s a linear function from the space of vectors to the real 
numbers. We call <*>-► the frequency-wavelength covector, or just 
the frequency covector for short. If -►o represents one second as 
measured on the clock of this observer, then u;->-o is the frequency 
uj measured by this observer in units of radians per second. If the 
same observer considers s to be a vector of simultaneity with a 
length of one meter, then is the observer’s measurement of the 
wavenumber k, defined as 2-k divided by the wavelength. 

6.3.1 Visualization 

In more than one dimension, there are natural ways of visualizing 
the different vector spaces inhabited by vectors and covectors. A 
vector is an arrow. A covector can be visualized as a set of parallel, 
evenly spaced lines on a topographic map, a/2, with an arrowhead 
to show which way is “uphill.” The act of measurement consists of 
counting how many of these lines are crossed by a certain vector, 
a/3. 

Parallelism between vectors and covectors Example 2 

It seems visually obvious in figure a/3 that the vector and covector 
are almost, but not exactly, parallel, since the arrowheads point 
in almost the same direction. Ordinarily, parallelism of nonzero 
vectors u and v would be expressed by the existence of a real 
number a such that u = av. But vectors and covectors are differ- 
ent kinds of beasts, belonging to different vector spaces. Scaling 
up a zebra will never produce a giraffe. If there is no metric, then 
this is simply a fact of life: there is no natural way to define paral- 
lelism between a vector v and a covector to. 

But if we have a metric, then we can define a magnitude for the 
vector c in a/3, and keep that magnitude constant while rotating v. 
If the metric is Euclidean, then this corresponds to rigidly rotating 
the arrow on the page, and o>— v is maximized for a certain ori- 
entation, which we define as the condition for parallelism. If the 
metric is noneuclidean, then things get a bit more complicated, 
but the same ideas apply if the vectors are either both spacelike 
or both timelike. For example, if both are timelike, then tu— v is 
minimized by parallelism, because the Cauchy-Schwarz inequal- 
ity is reversed (see sec. 1.5.1, p. 36.) 



a/1. A displacement vector. 
2. A covector. 3. Measurement 
is reduced to counting. The 
observer, represented by the 
displacement vector, counts 24 
wavefronts. 



b / Constant-temperature curves 
for January in North America, at 
intervals of 4°C. The tempera- 
ture gradient at a given point is a 
covector. 


6.3.2 The gradient 

Given a scalar field </>, its gradient V</> at any given point is a 
covector. The frequency covector is the gradient of the phase. In 
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birdtracks notation, we indicate this by writing it with an outward- 
pointing arrow, (V</>)-*-. Because gradients occur so frequently, bird- 
tracks notation has a special shorthand for them, which is simply a 
circle: 

( 2 > 

This notation can also be extended to the case where the thing 
being differentiated is not a scalar, but then some complications are 
encountered when the coordinates are not Minkowski; see section 
9.4.1, p. 196. 

Cosmological observers Example 3 

Time is relative, so what do people mean when they say that the 
universe is 13.8 billion years old? If a hypothetical observer had 
been around since shortly after the big bang, the time elapsed 
on that observer’s clock would depend on the observer’s world 
line. Two such observers, who had had different world-lines, could 
have differing clock readings. 

Modern cosmologists aren’t naive about time dilation. They have 
in mind a cosmologically preferred world-line for their observer. 
One way of constructing this world-line is as follows. Over time, 
the temperature 7 of the universe has decreased. (We define this 
temperature locally, but we average over large enough regions 
so that local variations don’t matter.) The negative gradient of 
this temperature, -VT, is a covector that points in a preferred 
direction in spacetime, and a preferred world-line for an observer 
is one whose velocity vector v is always parallel to -VT, in the 
sense defined in example 2 above. 

6.4 Duality 

6.4.1 Duality in 3+1 dimensions 

In our original 0 + 1-dimensional example of the cuckoo clock 
and the earth, we had duality: the measurements c-»e = 24 and 
e->-c = 1/24 really provided the same information, and it didn’t 
matter whether we made our scalar out of covector c-*- and vector 
-►e or covector e-<- and vector -*-c . All these quantities were simply 
clock rates, which could be described either by their frequencies 
(covectors) or their periods (vectors). 

To generalize this to 3 + 1 dimensions, we need to use the metric 
- a piece of machinery that we have never had to employ since 
the beginning of the chapter. Given a vector ->-r, suppose we knew 
how to produce its covector version r-*-. Then we could hook up the 
plumbing to form r-»r, which is just a number. What number could 
it be? The only reasonable possibility is the squared magnitude of 
r, which we calculate using the metric as r 2 = g(r, r). Since we can 
think of covectors as functions that take vectors to real numbers, 
clearly r-»- should be the function / defined by /(x) = g(r, x). 
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Finding the dual of a given vector Example 4 

> Given the vector --v = (3,4) in 1 + 1 -dimensional Minkowski 
coordinates, find the covector v— , i.e. , its dual. 

> Our goal is to write out an explicit expression for the covector in 
component form, 

v— = (a, b). 

To define these components, we have to have some basis in 
mind, consisting of one timelike observer-vector o and one space- 
like vector of simultaneity s. Since we’re doing this in Minkowski 

coordinates (section 1.2, 20), let’s notate these as — t and — x, 
where the hats indicate that these are unit vectors in the sense 
that f 2 = 1 and x 2 = -1. Writing v-*- in terms of a and b means 
that we’re identifying v-*- with the function f defined by f(x) = 
g(— v,x). Therefore 

f( 4) = a and f(— x) = b 

or 

g(— v, -t) = 3 = a and g(— v, -*-x) = -4 = b. 


The result of the formidable, fancy-looking calculation in exam- 
ple 4 was simply to take the vector 

(3,4) 

and flip the sign of its spacelike component to give the its dual, the 
covector 

(3, -4). 

Looking back at why this happened, it was because we were using 
Minkowski coordinates, and in Minkowski coordinates the form of 
the metric is g(p,q) = (+1 )ptqt + (— 1 )p x Qx + .... Therefore, we 
can always find duals in this way, provided that (1) we’re using 
Minkowski coordinates, and (2) the signature of the metric is, as 
assumed throughout this book, H , not — |- ++. 

Going both ways Example 5 

> Assume Minkowski coordinates and signature + . Given 

the vector 

—e = (8, 7) 

and the covector 

f - *" = (1,2), 

find e-*- and -*T 

o By the rule established above, we can find e— simply by flipping 
the sign of the 7, 

e— = (8, -7). 
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To find -f, we need to ask what vector (a, b), if we flipped the sign 
of b, would give us (a, -b) = (1,2). Obviously this is 


— f = (1,-2). 


In other words, flipping the sign of the spacelike part of a vector 
is also the recipe for changing covectors into vectors. 

Example 5 shows that in Minkowski coordinates, the operation 
of changing a covector to the corresponding vector is the same as 
that of changing a vector to its covector. Thus, the dual of a dual is 
the same thing you started with. In this respect, duality is similar 
to arithmetic operations such as x — > —x and x l/x. That is, the 
duality is a self-inverse operation — it undoes itself, like getting two 
sex-change operations in a row, or switching political parties twice in 
a country that has a two-party system. Birdtracks notation makes 
this self-inverse property look obvious, since duality means switching 
a inward arrow to an outward one or vice versa, and clearly doing 
two such switches gives back the original notation. This property 
was established in example 5 by using Minkowski coordinates and 

assuming the signature to be -| , but it holds without these 

assumptions (problem 1, p. 142). 

In the general case where the coordinates may not be Minkowski, 
the above analysis plays out as follows. Covectors and vectors are 
represented by row and column vectors. The metric can be specified 
by a matrix g so that the inner product of column vectors p and q 
is given by p T gq , where T represents the transpose. Rerunning 
the same logic with these additional complications, we find that the 
dual of a vector q is (gq) T , while the dual of a covector uj is (cog~ 1 ) T , 
where g~ l is the inverse of the matrix g. 

6.4.2 Change of basis 

We saw in section 6.2.2 that in 0 + 1 dimensions, vectors and 
covectors has opposite scaling properties under a change of units, 
so that switching our base unit from hours to minutes caused our 
frequency covectors to go up by a factor of 60, while our time vectors 
went down by the same factor. This behavior was necessary in order 
to keep scalar products the same. In more than one dimension, the 
notion of changing units is replaced with that of a change of basis. 
In linear algebra, row vectors and column vectors act like covectors 
and vectors; they are dual to each other. Let B be a matrix made 
of column vectors, representing a basis for the column-vector space. 
Then a change of basis for a row vector r is expressed as r' = rB, 
while the same change of basis for a column vector c is d = B~ 1 c. 
We then find that the scalar product is unaffected by the change of 
basis, since r'd = rBB~ 1 c = rc. 

In the important special case where B is a Lorentz transforma- 
tion, this means that covectors transform under the inverse trans- 
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formation, which can be found by flipping the sign of v. This fact 
will be important in the following section. 

6.5 The Doppler shift and aberration 

6.5.1 Doppler shift 

As an example, we generalize our previous discussion of the 
Doppler shift of light to 3 + 1 dimensions. 

For clarity, let’s first show how the 1 + 1-dimensional case works 
in our new notation. For a wave traveling to the left, we have 
= (id, id) (not (id, — id) — see figure d/1). We now want to 
transform into the frame of an observer moving to the right with 
velocity v relative to the original frame. Because id ► is a covec- 
tor, we do this using the inverse Lorentz transformation. An or- 
dinary Lorentz transformation would take a lightlike vector (id, id) 
to ( uj/D,lo/D ) (see section 3.2). The inverse Lorentz transforma- 
tion gives (Du, Du). The frequency has been shifted upward by the 
factor D , as established previously. 

In 3 + 1 dimensions, a spatial plane is determined by the light’s 
direction of propagation and the relative velocity of the source and 
observer, so this case reduces without loss of generality to 2 + 1 
dimensions. The frequency four-vector must be lightlike, so its most 
general possible form is (id, id cos 8, id sin 9), where 9 is interpreted 
as the angle between the direction of propagation and the relative 
velocity. In 2 + 1 dimensions, a Lorentz boost along the x axis looks 
like this: 

t' = yf — wyx 
x! = —vyt + yx 

y' = y 

The inverse transformation is found by flipping the sign of v. Putting 
our frequency vector through an inverse Lorentz boost, we find 

id' = ycu(l + v cos 9). 

For 8 = 0 the Doppler factor reduces to y(l + x) = D, recovering the 
1 + 1-dimensional result. For 6 = 90°, we have id' = ycu, which is 
interpreted as a pure time dilation effect when the source’s motion 
is transverse to the line of sight. 

To see the power of the mathematical tools we’ve developed in 
this chapter, you may wish to look at sections 6 and 7 of Einstein’s 
1905 paper on special relativity, where a lengthy derivation is needed 
in order to arrive at the same result. 

6.5.2 Aberration 

Imagine that rain is falling vertically while you drive in a con- 
vertible with the top down. To you, the raindrops appear to be 


Section 6.5 The Doppler shift and aberration 


133 



moving at some nonzero angle relative to vertical. This is referred 
to as aberration: a world-line’s direction changes depending on one’s 
frame of reference. In the street’s frame of reference, the angle be- 
tween the rain’s three- velocity and the car’s is 9 = 90°, but in the 
car’s frame O' ^ 90°. In this example, aberration is a large effect 
because the car’s speed v is comparable to the velocity u of the 
raindrops. To a snail crawling along the sidewalk at a much lower 
v, the effect would be small. Using the small-angle approximation 
tane ~ e, we find that for small v, the difference A 6 = O' — 8 would 
be approximately v/u, in units of radians. 

Compared to a ray of light, we’re all like snails. For example, the 
earth’s orbital speed is about v ~ 10 -4 in units where the speed of 
light u = 1, so we expect a maximum effect of about 1CU 4 radians, 
or 20" of arc, which is small but not negligible for a telescope with 
a high-quality mount, being used at high magnification. 

This estimate of astronomical aberration of light is roughly right, 
but we don’t expect it to be exact, both because of the small-angle 
approximation and because we calculated it using a Galilean picture 
of spacetime. Let’s calculate the exact result. As shown in example 
8 on p. 137, the direction of propagation of a light wave lies along 
the vector that is the dual to its frequency covector. Let’s call this 
direction of propagation -►u. Reusing the expression for defined 
in section 6.5.1, and arbitrarily fixing -»u’s timelike component to 
be 1, we have 

->-u = (1, — cos 9, — sin#). 

When this vector undergoes a boost v along the x axis it becomes 


_ *'u / = (7(1 + ucos#), 7 (— v — cos#), — sin#) 


The original angle # = tan l (u y /u x ) has been transformed to O' = 
tan ~ l {u' y /u' x ), the result being 


tan 9' 


sin# 

7 (cos # + v) 


A test of special relativity Example 6 

An assumption underlying this treatment of aberration was that 
the speed of light was u = c, regardless of the velocity of the 
source. Not all prerelativistic theories had this property, and one 
would expect that in such a theory, aberration would not be in 
accord with the relativistic result. In particular, suppose that we 
believed in Galilean spacetime, so that when a distant galaxy, 
receding from us at some speed w, emitted a ray of light toward 
us, the light’s velocity in our frame was u = c - w. That is, we 
imagine a theory in which emitting a ray of light is like shooting 
a bullet from a gun. Since aberration effects go approximately 
like v/u, we would expect that the reduced u would lead to more 
aberration compared to the prediction of relativity. 
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To test theories of this type, Heckmann 5 used a 24-inch reflector 
at Hamburg to take high-magnification photographic plates of a 
star field in Ursa Major containing 1 1 stars inside the Milky Way 
and 5 distant galaxies. Measurements of Doppler shifts showed 
that the galaxies were receding from us at velocities of about w = 
0.05c, whereas stars within the Milky Way move relative to us 
at speeds that are negligible in comparison. If, contrary to the 
relativistic prediction, this led to a 5% decrease in u, then we 
would expect about a 5% increase in aberration for the galaxies 
compared to the stars. 

Over the course of a year, the earth’s orbit carries it toward and 
away from Ursa Major, so that in the earth’s frame of reference, 
the stars and galaxies have varying velocities relative to us, and 
the ~ 20" aberration effect oscillates in direction. If the effect was 
different for the galaxies and the stars, then they ought to shift 
their apparent positions relative to one another. The shift ought 
to be on the order of 5% of 20", or one second of arc. The results 
from the observations showed that these relative positions did not 
appear to vary at all over the course of a year, with the average 
relative shift being 0.00 ± 0.06" of arc. This difference in aberra- 
tion is consistent with zero, as predicted by special relativity. 



c / 1 . The cube’s rest frame. 2. The observer’s frame. 3. The observer’s view of the cube, severely 
distorted by aberration. 


The view of an ultrarelativistic observer Example 7 

Figure c shows a visualization for an observer flying through a 
cube at v = 0.99. In c/1 , the cube is shown in its own rest frame, 
where it has sides of unit length, and the observer, having already 


5 Annates d’Astrophysique 23 (1960) 410, adsabs.harvard.edu/abs/ 

1960AnAp. . .23. .410H. 


Section 6.5 The Doppler shift and aberration 


135 






d / The surfer moves directly 
to the right with velocity vector u. 
The wave also propagates to the 
right. 


passed through, lies one unit to the right of the cube’s center. The 
observer is facing to the right, away from the cube. The dashed 
line is a ray of light that travels from point P to the observer, and 
in this frame it appears as though the ray, arriving from 0 = 162°, 
would not make it into the observer’s eye. 

But in the observer’s frame, c/2, the ray is at 0' = 47°, so it actu- 
ally does fall within her field of view. The cube is length-contracted 
by a factor y « 7. The ray was emitted earlier, when the cube was 
out in front of the observer, at the position shown by the dashed 
outline. 

The image seen by the observer is shown in c/3. The circular 
outline defining the field of view represents 0' = 50°. Note that 
the relativistic length contraction is not at all what an observer 
sees optically. The optical observation is influenced by length 
contraction, but also by aberration and by the time it takes for 
light to propagate to the observer. The time of propagation is 
different for different parts of the cube, so in the observer’s frame, 
c/2, rays from different points had to be emitted when the cube 
was at different points in its motion, if those rays were to reach 
the eye. 

A group at Australian National University has produced anima- 
tions of similar scenes, which can be found online by searching 
for “optical effects of special relativity.” 

It’s fun to imagine the view of an observer oboard an ultrarela- 
tivistic starship. For v sufficiently close to 1, any angle 0 < 180° 
transforms to a small 0'. Thus, all light coming to this observer 
from the surrounding stars — even those in extreme backward 
directions! — is gathered into a small, bright patch of light that 
appears to come from straight ahead. Some visible light would 
be shifted into the extreme ultraviolet and infrared, while some 
infrared and ultraviolet light would become visible. 

6.6 Phase and group velocity 

6.6.1 Phase velocity 

A wavefront is a line or surface of constant phase. In a snapshot 
of a wave at one moment of time, the direction of propagation of 
the wave is across the wavefronts. The visual situation is different 
in a spacetime diagram. In 1 + 1 dimensions, figure d/1, suppose 
that the lines represent the crest of the water waves. The surfer is 
on top of a crest, riding along with it. His velocity vector -*-u is in 
the spacetime direction that lies on top of the wavefront, not across 
it. Clearly both his motion and the propagation of the wave are to 
the right, not to the left as we might imagine based on experience 
with snapshots of waves. 
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In 2 + 1 dimensions, d/2, the surfer’s velocity is visualized as an 
arrow lying within a plane of constant phase. Given the wave’s phase 
information, there is more than one possible arrow of this kind. We 
could try to resolve the ambiguity by requiring that the arrow’s 
projection into the xy plane be perpendicular to the intersection of 
the wavefronts with that plane, but (with the exception of the case 
where the wave travels at c, example 8, p. 137) this prescription 
gives results that change depending on our frame of reference, and 
the changes are not describable by a Lorentz transformation of the 
velocity vector. This shows that in the general case, the phase in- 
formation of the wave, encoded in the frequency covector cu— , does 
not describe the direction of the wave’s propagation through space. 
At most it tells us the wave’s phase velocity , u/k, which is not really 
a velocity. All of these are symptoms of the fact that a velocity is 
supposed to be a vector, but cu— is a covector. The phase velocity 
lacks physical interest, because it is not the velocity at which any 
“stuff” moves. 

Velocity vector of a light wave, given its phase Example 8 

We’ve seen that in general, the information about the phase of a 
wave encoded in cu— does not determine its direction of propa- 
gation. The exception is a wave, such as a light wave, that prop- 
agates at c. Let a world-line of propagation of the wave lie along 
the vector — v. In the case of a wave propagating at c, we have 
v 2 = 0 (so that -v can’t have the usual normalization for a veloc- 
ity vector), and the dispersion relation is simply cu 2 = 0. Since the 
phase stays constant along a world-line of propagation, cu-v = 0. 
We therefore find that v and cu are two nonzero, lightlike vectors 
that are orthogonal to each other. But as shown in problem 10 on 
p. 39, this implies that the two vectors are parallel. Thus if we’re 
given the covector cu— , we just have to compute its dual — cu to 
find the direction of propagation. 

6.6.2 Group velocity 

The phase velocity is not the velocity at which “stuff” is trans- 
mitted by the wave. The velocity of the stuff is called the group 
velocity. To have a meaningully defined group velocity, we need to 
have a wave that is modulated, because an unmodulated wave is 
an infinite sine wave that stretches off to infinity, and such an un- 
modulated wave does not transmit any energy or information. An 
unmodulated wave has the same frequency covector cu— throughout 
all of spacetime, i.e., the same frequency cu and wavenumber k. One 
way of describing a modulated wave is by how cu— does change. 

But the different components of cu— are not free to change in 
any randomly chosen way. Normally they are constrained by a dis- 
persion relation. For example, surface waves in deep water obey 
the constraint (7 = 0, where C = cu 4 — a 2 k 2 (figure e) and a is a 
constant with units of acceleration, relating to the acceleration of 
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e / Points on the graph sat- 
isfy the dispersion relation C = 0 
for water waves. At a given point 
on the graph, the covector (VC)-*- 
tells us the group velocity. 


gravity. (Since the water is infinitely deep, there is no other scale 
that could enter into the constraint.) 

Now if a certain bump on the envelope with which the wave 
is modulated visits spacetime events P and Q, then whatever fre- 
quency and wavelength the wave has near the bump are observed 
to be the same at P and Q. In general, k and c o are constant along 
the spacetime displacement of any point on the envelope, so the 
spacetime displacement -*-r from P to Q must satisfy the condition 
(Vw)-»r = 0. 


In addition, Vu must be tangent to the surface of constraint 
C = 0, so that the wave always obeys the constraint. Thus given a 
point cu— ► in frequency space, the direction of propagation r must be 
uniquely determined by the constraint. Suppose C is a well-behaved 
function, so that it is approximately a linear function of any small 
change Au, i.e., in 1 + 1 dimensions we have 


AC 


cXJ 

du 


Alo + 


3C 

dk 


A k. 


In this approximation, AC is a linear function that acts on a covector 
Au and gives back a scalar. In other words, AC acts like a vector 
with components 


AC 


(dC_ dC\ 
\ doj ’ dk ) 


This vector is parallel to r, so that it points in the wave’s direction 
of propagation through spacetime, and tells us its group velocity 
(dC /dk)/ (dC /du). In our example of water waves, a calculation 
shows that the group velocity is ±a/2u;, which is half the phase 
velocity. 


6.7 Abstract index notation 

This chapter has centered on the physics of waves, but along the way 
we’ve found it helpful to build up some mathematical ideas such 
as covectors, which have applications in a much broader physical 
context. In this section we’ll develop some related notation. 

Expressions in birrltracks notation such as 

s 

can be awkward to type on a computer, which is why we’ve al- 
ready been occasionally resorting to more linear notations such as 
(VC)-*-s. For more complicated birdtracks, the diagrams sometimes 
look like complicated electrical schematics, and the problem of gen- 
erating them on a keyboard get more acute. There is in fact a sys- 
tematic way of representing any such expression using only ordinary 
subscripts and superscripts. This is called abstract index notation, 
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and was introduced by Roger Penrose at around the same time he 
invented birdtracks. For practical reasons, it was the abstract index 
notation that caught on. 

The idea is as follows. Suppose we wanted to describe a compli- 
cated birdtrack verbally, so that someone else could draw it. The 
diagram would be made up of various smaller parts, a typical one 
looking something like the scalar product u-^v. The verbal instruc- 
tions might be: “We have an object u with an arrow coming out of 
it. For reference, let’s label this arrow as a. Now remember that 
other object v I had you draw before? There was an arrow coming 
into that one, which we also labeled a. Now connect up the two 
arrows labeled a.” 

Shortening this lengthy description to its bare minimum, Penrose 
renders it like this: u a v a . Subscripts depict arrows coming out of 
a symbol (think of water flowing from a tank out through a pipe 
below). Superscripts indicate arrows going in. When the same letter 
is used as both a superscript and a subscript, the two arrows are to 
be piped together. 

Abstract index notation evolved out of an earlier one called 
the Einstein summation convention, in which superscripts and sub- 
scripts referred to specific coordinates. For example, we might take 
0 to be the time coordinate, 1 to be x, and so on. A symbol like u\ 
would then indicate a component of the dual vector u, which could 
be its x component if A took on the value 1. Repeated indices were 
summed over. 

The advantage of the birdtrack and abstract index notations is 
that they are coordinate-independent, so that an equation written 
in them is valid regardless of the choice of coordinates. The Einstein 
and abstract-index notations look very similar, so for example if we 
want to take a general result expressed in abstract-index notation 
and apply it in a specific coordinate system, there is essentially no 
translation required. In fact, the two notations look so similar that 
we need an explicit way to tell which is which, so that we can tell 
whether or not a particular result is coordinate-independent. We 
therefore use the convention that Latin indices represent abstract 
indices, whereas Greek ones imply a specific coordinate system and 
can take on numerical values, e.g., A = 1. 

The following are some examples of equivalent equations written 
side by side in birdtracks and abstract index notations. 

Observer o’s displacement in spacetime is a vector: 


In Einstein notation, it’s awkward to express a vector as a whole, 
because in a notation like o A , A is supposed to take on a particular 
value. If we used o x to mean the whole vector, it would be an abuse 
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of notation. In abstract index notation, however, the a is simply a 
name we gave to a pipe coming into vector o; the fact that we didn’t 
need to refer to the name in order to connect it to some other pipe 
is irrelevant. 

A wave’s frequency is a covector: 

L0 a 

An observer experiences proper time r: 

o-*-o = t 2 o a o a = t 2 

There are no external arrows in the birdtracks version, and in the 
abstract-index version all lower indices (pipes coming out) have been 
paired with upper indices (pipes coming in); this indicates that the 
proper time is a scalar, and therefore independent of any choice of 
coordinate system. In Einstein notation, this becomes o\o x , with 
an implied sum over the repeated index, ^ \°\° X - The A refers to 
a particular coordinate system, so in the Einstein notation it is no 
longer obvious that the equation holds regardless of our choice of 
coordinates. 

A world-line along which a wave propagates lies along a vector 
that is orthogonal to the wave’s frequency covector: 

a;-^u = 0 0J a u a = 0 

The frequency covector is the gradient of the phase: 



The following grammatical rules apply to both abstract-index 
and Einstein notation: 

1. Repeated indices occur in pairs, with one up and one down 
and the two factors multiplying each other. 

2. Disregarding indices that are paired as in rule 1, all other 
indices must appear uniformly in all terms and on both sides 
of an equation. “Appear uniformly” means that an index can’t 
be missing and can’t be a superscript in some places but a 
subscript in others. 

3. For reasons to be explained in section 7.4, p. 148, a partial 
derivative with respect to a coordinate, such as d/dx k , is 
treated as if the index were a subscript, and conversely d/dxk 
is considered to have a superscripted k. 

In abstract-index notation, rule 1 follows because the indices are 
simply labels describing how, in birdtracks notation, the pipes should 
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be hooked up. Violating rule 1, as in an expression like v a v a , pro- 
duces a quantity that does not actually behave as a scalar. An 
example of a violation of rule 2 is v a = oj a . This doesn’t make 
sense, for the same reason that it doesn’t make sense to equate a 
row vector to a column vector in linear algebra. Even if an equation 
like this did hold in one frame of reference, it would fail in another, 
since the left-hand and right-hand sides transform differently under 
a boost. 

In section 6.4.1 we discussed the notion of finding the covector 
that was dual to a given vector, and the vector dual to a given 
covector. Because the distinction between vectors and covectors 
is represented in index notation by placing the index on the top 
or on the bottom, relativists refer to this kind of thing as raising 
and lowering indices. In general, this type of manipulation is called 
“index gymnastics.” Here’s what raising and lowering indices looks 
like. 

Converting a vector to its covector form: 

u a = g a bU b 

Changing a covector to the corresponding vector: 


The symbol g ab refers to the inverse of the matrix g a t>. 
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Problems 


1 In section 6.4.1, I proved that duality is a self-inverse oper- 
ation, invoking Minkowski coordinates and assuming the signature 
to be -| . Show that these assumptions were not necessary. 
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Chapter 7 

Coordinates 


In your previous study of physics, you’ve seen many examples where 
one coordinate system makes life easier than another. For a block 
being pushed up an inclined plane, the most convenient choice may 
be to tilt the x and y axes. To find the moment of inertia of a 
disk we use cylindrical coordinates. The same is true in relativ- 
ity. Minkowski coordinates are not always the most convenient. In 
chapter 6 we learned to classify physical quantities as covectors, 
scalars, and vectors, and we learned rules for how these three types 
of quantities transformed in two special changes of coordinates: 

1. When we rescale all coordinates by a factor a, the components 
of vectors, scalars, and covectors scale by a p , where p = + 1, 
0, and —1, respectively. 

2. Under a boost, the three cases require respectively the Lorentz 
transformation, no transformation, and the inverse Lorentz 
transformation. 

In this chapter we’ll learn how to generalize this to any change of 
coordinates, 1 and also how to find the form of the metric expressed 
in non-Minkowski coordinates. 

7.1 An example: accelerated coordinates 

Let’s start with a concrete example that has some physical interest. 
In section 5.2, p. 120, we saw that we could have “gravity without 
gravity:” an experiment carried out in a uniform gravitational field 
can be interpreted as an experiment in flat spacetime (so that spe- 
cial relativity applies), but with the measurements expressed in the 
accelerated frame of the earth’s surface. In the Pound-Rebka ex- 
periment, all of the results could have been expressed in an inertial 
(free- falling) frame of reference, using Minkowski coordinates, but 
this would have been extremely inconvenient, because, for example, 
they didn’t want to drop their expensive atomic clocks and take the 
readings before the clocks hit the floor and were destroyed. 

Since this is “gravity without gravity,” we don’t actually need 
a planet cluttering up the picture. Imagine a universe consist- 

1 We do require the change of coordinates to be smooth in the sense defined 
on p. 127, i.e., it should be a diffeomorphism. 
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a / The transformation between 
Minkowski coordinates ( t , x) 
and the accelerated coordinates 
(T,X) 


ing of limitless, empty, flat spacetime. Describe it initially using 
Minkowski coordinates ( t,x,y,z ). Now suppose we want to find a 
new set of coordinates (T, X,Y, Z) that correspond to the frame of 
reference of an observer aboard a spaceship accelerating in the x 
direction with a constant acceleration. 

The Galilean answer would be X = x — ^ at 2 . But this is un- 
satisfactory from a relativistic point of view for several reasons. At 
t = c/a the observer would be moving at the speed of light, but 
relativity doesn’t allow frames of reference moving at c (section 3.4, 
p. 59). At t > c/a , the observer’s motion would be faster than c, 
but this is impossible in 3 + 1 dimensions (section 3.8, p. 69). 

These problems are related to the fact that the observer’s proper 
acceleration, i.e., the reading on an accelerometer aboard the ship, 
isn’t constant if x = \at 2 . We saw in example 4 on p. 61 that 
constant proper acceleration is described by x = ^ cosh ar, t = 
/ sinh ar, where r is the proper time. For this motion, the velocity 
only approaches c asymptotically. This suggests the following for 
the relationship between the two sets of coordinates: 

t. = X sinh T 
x = X cosh T 
y = Y 
z — Z 

For example, if the ship follows a world-line ( T , X) = (r, 1), then its 
motion in the unaccelerated frame is ( t,x ) = (sinh r, cosh r), which 
is of the desired form with a = 1. 

The (T, X , T, Z) coordinates, called Rindler coordinates, have 
many, but not all, of the properties we would like for an accelerated 
frame. Ideally, we’d like to have all of the following: (1) the proper 
acceleration is constant for any world- line of constant (A, Y, Z)\ (2) 
the proper acceleration is the same for all such world-lines, i.e., the 
fictitious gravitational field is uniform ; and (3) the description of 
the accelerated frame is just a change of coordinates, i.e., we’re just 
talking about the flat spacetime of special relativity, with events 
renamed. It turns out that we can pick two out of three of these, 
but it’s not possible to satisfy all three at the same time. Rindler 
coordinates satisfy conditions 1 and 3, but not 2. This is because the 
proper acceleration of a world-line of constant (A, A, Z) can easily 
be shown to be 1/A, which depends on A. Thus we don’t speak of 
Rindler coordinates as “the” coordinates of an accelerated observer. 

Rindler coordinates have the property that if a rod extends along 
the A axis, and external forces are applied to it in just such a way 
that every point on the rod has constant A, then it accelerates along 
its own length without any stress. (See problem 7, p. 214.) 

The diagonals are event horizons (p. 62). Their intersection lies 
along every constant-T line; cf. example 17, p. 34, and p. 73. 
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7.2 Transformation of vectors 


Now suppose we want to transform a vector whose components 
are expressed in the (T, X) coordinates into components expressed 
in (t,x). Our most basic example of a vector is a dispacement 
(AT, AX), and if we make this an infinitesimal (dT, dX) then we 
don’t need to worry about the fact that the chart in figure a has 
curves on it — close up, curves look like straight lines . 2 If we think 
of the coordinate t as a function of two variables, t = t(T, X), then 
t is changing for two different reasons: its first input T changes, and 
also its second input X. If t were only a function of one variable 
f(T), then the change in t would be given simply by the chain rule, 
d t = - t -L^-T . Since it actually has two such reasons to change, we 
add the two changes: 


d t 


dt 

df A + 


dt 

OX 


dX 


The derivatives are partial derivatives, and these derivatives exist 
because, as we will always assume, the change of coordinates is 
smooth. An exactly analogous expression applies for dx. 


dx 


dx 

df A + 


dx 

dX 


dX 


Before we carry out the details of this calculation, let’s stop 
and note that the results so far are completely general. Since we 
have so far made no use of the actual equations for this particular 
change of coordinates, these expressions would apply to any such 
transformation, including the special cases we’ve encountered so far, 
such as Lorentz transformations and scaling. (For example, if we’d 
been scaling by a factor a, then all of the partial derivatives would 
simply have equaled a.) Furthermore, our definition of a vector is 
that a vector is anything that transforms like a vector. Since we’ve 
established that the rules above apply to a displacement vector, we 
conclude that they would also apply to any other vector, say an 
energy-momentum vector. 

Returning to this specific example, application of the facts 
dsinh u / du = coshu and dcoshtt/ du = sinli u tells us that the vec- 
tor 

(dT, dX) 

is transformed to: 

(dt, d.x) = (X coshT dT + sinhT dX , X sinhT dT + coshT dX) 


As an example of how this applies universally to any type of 
vector, suppose that the observer aboard a spaceship with world-line 

2 Here we make use of the fact that the change of coordinate was smooth, i.e. , 
a diffeomorphism. Otherwise the curves could have kinks in them that would 
still look like kinks under any magnification. 
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(T,X) = (r, 1) has a favorite paperweight with mass m. According 
to measurements carried out aboard her ship, its energy-momentum 
vector is 

0 PT,Px ) = 0 , 0 ). 

In the unaccelerated coordinates, this becomes 

( Pt.iPx ) = (A cosh T px + sinhT px , X sinhT px + coshT px) 

= (rnX cosh T, mX sinh T) 

= (m cosh t , m sinh r ) . 

Since the functions cosh and sinh behave like e x for large x, we find 
that after the astronaut has spent a reasonable amount of proper 
time r accelerating, the paperweight’s mass-energy and momentum 
will have grown to the point where it’s an awesome weapon of mass 
destruction, capable of obliterating an entire galaxy. 

7.3 Transformation of the metric 

Continuing with the example of accelerated coordinates, let’s find 
what happens to the metric when we change from Minkowski coor- 
dinates. Minkowski coordinates are essentially defined so that the 
metric has the familiar form with coefficients +1 and —1. In relativ- 
ity, one often presents the metric by showing its result when applied 
to an infinitesimal displacement (dt,dx): 

ds 2 = df 2 - dx 2 

Here ds would represent proper time, in the case where the displace- 
ment was timelike. Since we’ve already determined that 

dt. = X cosh T d T + sinh T dX and 

dx = X sinhT dT + coshT dX , 

we can simply substitute into the expression for ds in order to find 
the form of the metric in (T, X) coordinates. Employing the identity 
cosh 2 — sinh 2 = 1, we find 

ds 2 = X 2 d T 2 - dA 2 . 

The varying value of the dT 2 coefficient is in fact exactly the kind 
of gravitational time dilation effect whose existence we predicted in 
section 5.2.5, p. 122 based on the equivalence principle. The form 
of the metric inferred there was 

ds 2 « (1 + 2A<f*) dT 2 - dA 2 , 

where Ad> is the difference in gravitational potential relative to some 
reference height. One of the approximations employed was the as- 
sumption that the range of heights X was small, but subject to 
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that approximation, the two results should agree. For convenience, 
let’s consider observers in the region X ~ 1, where the accelera- 
tion is approximately 1. Then the A<1> = d>(l + AX) — <J>(1) ps 
( acceleration) (height) ~ X , so the time coefficient in the second 
form of the metric is ~ 1 + 2A<f> ~ 1 + 2AA. But to within the de- 
sired level of approximation, this is the same as X 2 = (1 + A A) 2 ~ 

1 + 2AA. 

The procedure employed above works in general. To transform 
the metric from coordinates (t, x, y, z ) to new coordinates x ' , y' , z'), 

we obtain the unprimed coordinates in terms of the primed ones, 
take differentials on both sides, and eliminate t, . . . , dt, ... in fa- 
vor of t' , . . . dt' , ... in the expression for ds 2 . We’ll see in section 
9.2.4, p. 180, that this is an example of a more general transforma- 
tion law for tensors, mathematical objects that generalize vectors 
and covectors in the same way that matrices generalize row and col- 
umn vectors. A scalar, with no indices, is called a tensor of rank 0. 
Vectors and covectors, having one index, are called rank-1 tensors. 



b / Example 1 . 


A map projection Example 1 

Because the earth’s surface is curved, it is not possible to rep- 
resent it on a flat map without distortion. Let cj) be the latitude, 
0 the angle measured down from the north pole (known as the 
colatitude), both measured in radians, and let a be the earth’s ra- 
dius. Then by the definition of radian measure, an infinitesimal 
north-south displacement by d0 is a distance ad0. A point at a 
given colatitude 0 lies at a distance a sin 0 from the axis, so for an 
infinitesimal east-west distance we have asin0dcj). For conve- 
nience, let the units be chosen such that a = 1. Then the metric, 
with signature ++, is 

ds 2 = d0 2 + sin 2 0 dcp. 

One of the many possible ways of forming a flat map is the Lam- 
bert cylindrical projection, 

X = (j) 

y = cos 0, 

shown in figure b. If we see a distance on the map and want 
to know how far it actually is on the earth’s surface, we need 
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to transform the metric into the (x,y) coordinates. The inverse 
coordinate transformation is 

cf> = x 

0 = cos -1 y. 

Taking differentials on both sides, we get 


dcf) = dx 



We take the metric and eliminate 0, cf, d0, and dcf), finding 

ds 2 = (1 - y 2 ) dx 2 + — — — 9 dy 2 . 

1 — y l 

In figure b, the polka-dot pattern is made of figures that are ac- 
tually circles, all of equal size, on the earth’s surface. Since they 
are fairly small, we can approximate y as having a single value 
for each circle, which means that they are represented on the 
flat map as approximate ellipses with their east-west dimensions 
having been stretched by (1 - y 2 ) -1 / 2 and their north-south ones 
shrunk by (1 - y 2 ) 1//2 . Since these two factors are reciprocals of 
one another, the area of each ellipse is the same as the area of 
the original circle, and therefore the same as those of all the other 
ellipses. They are a visual representation of the metric, and they 
demonstrate the equal-area property of this projection. 


7.4 Summary of transformation laws 

Having worked through one example in detail, let’s progress from 
the specific to the general. In the Einstein concrete index notation, 
let coordinates (x°, x 1 , x 2 , x 3 ) be transformed to new coordinates 
(x /0 , x' 1 , x 12 , x' 3 ). Then vectors transform according to the rule 



where the Einstein summation convention implies a sum over the 
repeated index k. By the same reasoning as in section 6.4.2, p. 132, 
the transformation for a covector tv is 


t o 


/ 

A 1 


dx K 
dx 'v 


( 2 ) 


Note the inversion of the partial derivative in one equation compared 
to the other. Because these equations describe a change from one 
coordinate system to another, they clearly depend on the coordinate 
system, so we use Greek indices rather than the Latin ones that 
would indicate a coordinate-independent abstract index equation. 
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The letter /j in these equations always appears as an index re- 
ferring to the new coordinates, k to the old ones. For this rea- 
son, we can get away with dropping the primes and writing, e.g., 
= v K dx ,fl /dx K rather than v' , counting on context to show that 
is the vector expressed in the new coordinates, v K in the old ones. 
This becomes especially natural if we start working in a specific co- 
ordinate system where the coordinates have names. For example, 
if we transform from coordinates ( t,x,y,z ) to ( a,b,c,d ), then it is 
clear that v t is expressed in one system and v c in the other. 

In equation (2), p appears as a subscript on the left side of the 
equation, but as a superscript on the right. This would appear to 
violate the grammatical rules given on p. 140, but the interpreta- 
tion here is that in expressions of the form d/dx l and d/dxi, the 
superscripts and subscripts should be understood as being turned 
upside-down. Similarly, (1) appears to have the implied sum over k 
written ungrammatically, with both k’s appearing as superscripts. 
Normally we only have implied sums in which the index appears 
once as a superscript and once as a subscript. With our new rule 
for interpreting indices on the bottom of derivatives, the implied sum 
is seen to be written correctly. This rule is similar to the one for 
analyzing the units of derivatives written in Leibniz notation, with, 
e.g., cP x/ df 2 having units of meters per second squared. That is, 
the flipping of the indices like this is required for consistency so 
that everything will work out properly when we change our units of 
measurement, causing all our vector components to be rescaled. 

The identity transformation Example 2 

In the case of the identity transformation x ,[L = x^, equation (1) 
clearly gives v' = v, since all the mixed partial derivatives dx'^/dx K 
with q 4 « are zero, and all the derivatives for k = q equal 1 . 

In equation (2), it is tempting to write 

Bx K 1 

= Wl (wrong!), 

ox dx K 

but this would give infinite results for the mixed terms! Only in the 
case of functions of a single variable is it possible to flip deriva- 
tives in this way; it doesn’t work for partial derivatives. To evalu- 
ate these partial derivatives, we have to invert the transformation 
(which in this example is trivial to accomplish) and then take the 
partial derivatives. 

Polar coordinates Example 3 

None of the techniques discussed here are particular to relativity. 
For example, consider the transformation from polar coordinates 
(r, 0) in the plane to Cartesian coordinates 

x = rcos 0 
y = r sin 0. 
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A bug sits on the edge of a phonograph turntable, at (. r , 0) = (1 , 0). 
The turntable rotates clockwise, giving the bug a velocity vector 
v K = (v r , v 0 ) = (0,-1), i.e. , the angular velocity is one radian per 
second in the negative (counterclockwise) direction. Let’s find the 
bug’s velocity vector in Cartesian coordinates. The transformation 
law for vectors gives. 


v* = ir 


dx 

dx < 


Expanding the implied sum over the repeated index k, we have 


x r d% g 

V = V r — + 


dx 

09 


dr 

dx dx 
= (0) ar (_1, ae 

= -rsin 0 

= 0 . 


For the y component, 


v y = v 


r dy_, v edy 

dr dQ 


= (0 )^ + (-1 ) 9 -l 

K ’ dr { ’ dQ 

= -rsin 0 

= - 1 . 


7.5 Inertia and rates of change 

Suppose that we describe a flying bullet in polar coordinates. We 
neglect the vertical dimension, so the bullet’s motion is linear. If the 
bullet has a displacement of (An, A0i) in an short time interval At, 
then clearly at a later point in its motion, during an equal interval, 
it will have a displacement (An, A9-2) with two different numbers 
inside the parentheses. This isn’t because its velocity or momentum 
really changed. It’s because the coordinate system is curvilinear. 
There are three ways to get around this: 

1. Use only Minkowski coordinates. 

2. Instead of characterizing inertial motion as motion with con- 
stant velocity components, we can instead characterize it as 
motion that maximizes the proper time (section 2.4.2, p. 48). 

3. Define a correction term to be added when taking the deriva- 
tive of a vector or covector expressed in non- Minkowski coor- 
dinates. 
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These issues become more acute in general relativity, where curva- 
ture of spacetime can make option 1 impossible. Option 3, called the 
covariant derivative, is discussed in optional section 9.4 on p. 193. 
If you aren’t going to read that section, just keep in mind that in 
non-Minkowski coordinates, you cannot naively use changes in the 
components of a vector as a measure of a change in the vector itself. 


7.6 * Volume, orientation, and the Levi-Civita 
tensor 

This optional section introduces some geometrical machinery that 
is used in both special and general relativity. 

7.6.1 Volume 

Desirable properties 

In 3 + 1 dimensions, we have a natural way of defining four- 
dimensional volume, which is to pick a frame of reference and let 
the element of volume be df dx d y d z in the Minkowski coordinates 
of that frame. Although this definition of 4- volume is stated in terms 
of certain coordinates, it turns out to be Lorentz-invariant (section 
2.5, p. 49). It also has the following desirable properties, which we 
state for an arbitrary value of m from 1 to 4: 

VI. Any two m - volumes can be compared in terms of their ratio. 

V2. For any m nonzero vectors, the m-volume of the paral- 
lelepiped they span is nonzero if and only if the vectors are linearly 
independent (that is, if none of them can be expressed in terms of 
the others using scalar multiplication and vector addition). 

We would also like to have convenient methods for working with 
three- volume, two- volume (area), and one- volume (length). But the 
m - volumes for m < 4 give us headaches if we try to define them so 
that they obey both VI and V2. For example, the obvious way to 
define length (m = 1) is to use the metric, but then lightlike vectors 
would violate V2. 


Affine measure 

If we’re willing to abandon VI, then the following approach suc- 
ceeds. Consider the m = 1 case. We ignore the metric completely 
and exploit the fact that in special relativity, spacetime is flat (pos- 
tulate P2, p. 46), so that parallelism works the same way as in 
Euclidean geometry. Let i be a line, and suppose we want to define 
a number system on this line that measures how far apart events 
are. Depending on the type of line, this could be a measurement of 
time, of spatial distance, or a mixture of the two. First we arbitrar- 
ily single out two distinct points on i and label them 0 and 1, as in 
figure c. Next, pick some auxiliary point q 0 not lying on t. Con- 
struct qgqj^ and parallel to 01 and lqj parallel to Oq 0 , forming the 



c/ Using parallelism to define 
1 -volume. 
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d/The area of the viola can 
be determined by counting the 
parallelograms formed by the 
lattice. The area can be deter- 
mined to any desired precision, 
by dividing the parallelograms 
into fractional parts that are as 
small as necessary. 


> 



e / Linearity of area. Dou- 
bling the vector a doubles the 
area. 



f / The viola has a different 
area when measured using a 
different parallelogram as the 
unit. 


parallelogram shown in the figure. Continuing in this way, we have 
a scaffolding of parallelograms adjacent to the line, determining an 
infinite lattice of points 1, 2, 3, ... on the line, which represent the 
positive integers. Fractions can be defined in a similar way. For 
example, ^ is defined as the point such that when the initial lattice 
segment 0| is extended by the same construction, the next point on 
the lattice is 1. The continuously varying variable constructed in 
this way is called an affine parameter. The time measured by a free- 
falling clock is an example of an affine parameter, as is the distance 
measured by the tick marks on a free-falling ruler. An affine param- 
eter can only be defined along a straight world-line, not an arbitrary 
curve. The affine measurement of 1-volume violates VI, because it 
only allows us to compare distances that lie on i or parallel to it. 
On the other hand, it has the advantage over metric measurement 
that it allows us to measure lengths along lightlike lines. 

Figure d shows how to define an affine measure of 2- volume, and 
a similar method works for 3-volume. 

Linearity 

Suppose that a parallelogram is formed with vectors a and b as 
two of its sides. It we double a, then the area doubles as well, 

area(2a, b) = 2 area(a, b). 

In general, if we scale either of the vectors by a factor c, the area 
scales by the same factor, provided that we set some rule for han- 
dling signs — an issue that we’ll postpone until section 7.6.2. Some- 
thing similar happens when we add two vectors, e.g., 

area(a, b + c) = area(a, b) + area(a, c), 

again postponing issues with signs. We refer to these properties as 
linearity of the affine 2-volume. Any sensible measure of m-volume 
should have similar linearity properties. 

Change of basis 

Because we have not made use of the metric so far, all of our 
measures of area have been relative rather than absolute. As shown 
in figure f, they depend on what parallelogram we choose as our unit 
of area. The unit cell in f/2 is smaller than the one in f/1, for two 
reasons: the vectors defining the edges are shorter, and the angle 
between them is smaller. Words like “shorter” and “angle” show 
us resorting to metric measurement, but we could also perform the 
comparison without using the metric, simply by using parallelogram 
1 to measure parallelogram 2, or 2 to measure 1. If we think of such 
a pair of vectors as basis vectors for the plane, then switching our 
choice of unit parallelogram is equivalent to a change of basis. Areas 
change in proportion to the determinant of the matrix specifying the 
change of basis. 
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A halfling basis Example 4 

Suppose that a' = a/2, and b' = b/2. The change of basis from 
the unprimed pair to the primed pair is given by the matrix 

2 0 \ 

0 2 ) ’ 

which has determinant 4. Scaling down both basis vectors by a 
factor of 2 has caused a reduction by a factor of 4 in the area 
of the unit parallelogram. If we use the primed parallelogram to 
measure other areas, then all the areas will come out bigger by a 
factor of 4. 

Rotations and Lorentz boosts are changes of basis. They have 
determinants equal to 1, i.e. , they preserve spacetime volume. 

7.6.2 Orientation 

As shown in figure g, linearity of area requires that some areas 
be assigned negative values. If we compare the areas +1 and —1, we 
see that the only difference is one of orientation, or handedness. In 
the case to which we have arbitrarily assigned area +1, vector b lies 
counterclockwise from vector a, but when a is flipped, the relative 
orientation becomes clockwise. 

If you’ve had the usual freshman physics background, then you’ve 
seen this issue dealt with in a particular way, which is that we as- 
sume a third dimension to exist, and define the area to be the vector 
cross product a x b, which is perpendicular to the plane inhabited 
by a and b. The trouble with this approach is that it only works 
in three dimensions. In four dimensions, suppose that a lies along 
the x axis, and b along the t axis. Then if we were to define a x b, 
it should be in a direction perpendicular to both of these, but we 
have more than one such direction. We could pick anything in the 
y-z plane. 

To get started on this issue in m dimensions, where m does 
not necessarily equal 3, we can consider the m - volume of the Tri- 
dimensional parallelepiped spanned by m vectors. For example, 
suppose that in 4-dimensional spacetime we pick our m vectors to 
be the unit vectors lying along the four axes of the Minkowski co- 
ordinates, t, x, y, and z. From experience with the vector cross 
product, which is anticommutative, we expect that the sign of the 
result will depend on the order of the vectors, so let’s take them in 
that order. Clearly there are only two reasonable values we could 
imagine for this volume: +1 or —1. The choice is arbitrary, so we 
make an arbitrary choice. Let’s say that it’s +1 for this order. This 
amounts to choosing an orientation for spacetime. 

A hidden and nontrivial assumption was that once we made this 
choice at one point in spacetime, it could be carried over to other 
regions of spacetime in a consistent way. This need not be the case, 
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h/A Mobius strip is not an 
orientable surface. 



i/Tullio Levi-Civita (1873-1941) 
worked on models of number sys- 
tems possessing infinitesimals 
and on differential geometry. 
He invented the tensor notation, 
which Einstein learned from his 
textbook. He was appointed 
to prestigious endowed chairs 
at Padua and the University of 
Rome, but was fired in 1938 
because he was a Jew and an 
anti-fascist. 


as suggested in figure h. However, our topic at the moment is special 
relativity, and as discussed briefly on p. 48, it is usually assumed in 
special relativity that spacetime is topologically trivial, so that this 
issue arises only in general relativity, and only in spacetimes that 
probably are not realistic models of our universe. 

Since 4-volume is invariant under rotations and Lorentz trans- 
formations, our choice of an orientation suffices to fix a definition 
of 4-volume that is a Lorentz invariant. If vectors a, b, c, and d 
span a 4-parallelepiped, then the linearity of volume is expressed by 
saying that there is a set of coefficients such that 

V = eij k ia l b>c k d l . 

Notating it this way suggests that we interpret it as abstract index 
notation, in which case the lack of any indices on V means that it 
is not just a Lorentz invariant but also a scalar. 3 

Halfling coordinates Example 5 

Let ( t,x,y,z ) be Minkowski coordinates, and let (? , x' , y' , z') = 
(2 1, 2x, 2 y, 2 z). Let’s consider how each of the factors in our vol- 
ume equation is affected as we do this change of coordinates. 

no change '^7/16 x2 x2 x2 x2 

Since our convention is that V is a scalar, it doesn’t change under 
a change of coordinates. This forces us to say that the compo- 
nents of e change by a factor of 1/16 in this example. 

The result of example 5 tells us that under our convention that 
volume is a scalar, the components of e must change when we change 
coordinates. One could argue that it would be more logical to think 
of the transformation in this example as a change of units, in which 
case the value of V would be different in the new units; this is a 
possible alternative convention, but it would have the disadvantage 
of making it impossible to read off the transformation properties of 
an object from the number and position of its indices. Under our 
convention, we can read off the transformation properties in this 
way. Although section 7.4 only presented these properties in the case 
of tensors of rank 0 and 1, deferring the general description of higher- 
rank tensors to sec. 9.2.4, p. 180, e’s transformation properties are, 
as implied by its four subscripts, those of a tensor of rank 4. Different 
authors use different conventions regarding the definition of e, which 
was originally described by the mathematician Levi-Civita. Since 
by our convention e is a tensor, we refer to it as the Levi-Civita 
tensor. In other conventions, where e is not a tensor, it may be 
referred to as the Levi-Civita symbol. Since the notation is not 
standardized, I will occasionally put a reminder next to important 
equations containing e stating that this is the tensorial e. 

,s For the distinction, see p. 127. 
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The Levi-Civita tensor has lots and lots of indices. Scary! Imag- 
ine the complexity of this beast. (Sob.) We have four choices for 
the first index, four for the second, and so on, so that the total num- 
ber of components is 256. Wait, don’t reach for the kleenex. The 
following example shows that this complexity is illusory. 

Volume in Minkowski coordinates Example 6 

We’ve set up our definitions so that for the parallelepiped t, x, y, 
z, we have V = +1 . Therefore 


e txyz - +1 

by definition, and because 4-volume is Lorentz invariant, this holds 
for any set of Minkowski coordinates. 

If we interchange x and y to make the list t, y, x, z, then as in 
figure g, the volume becomes -1 , so 

Cfyxz = — 1 ■ 

Suppose we take the edges of our parallelepiped to be t, x, x, 
z, with y omitted and x duplicated. These four vectors are not 
linearly independent, so our parallelepiped is degenerate and has 
zero volume. 

EfXXZ = 0. 

From these examples, we see that once any element of e has 
been fixed, all of the others can be determined as well. The rule is 
that interchanging any two indices flips the sign, and any repeated 
index makes the result zero. 

Example 6 shows that the the fancy symbol eijki , which looks like 
a secret Mayan hieroglyph invoking 256 different numbers, actually 
encodes only one number’s worth of information; every component 
of the tensor either equals this number, or minus this number, or 
zero. Suppose we’re working in some set of coordinates, which may 
not be Minkowski, and we want to find this number. A complicated 
way to find it would be to use the tensor transformation law for a 
rank-4 tensor (sec. 9.2.4, p. 180). A much simpler way is to make use 
of the determinant of the metric, discussed in example 1 on p. 127. 
For a list of coordinates ijkl that are sorted out in the order that 
we define to be a positive orientation, the result is simply e^ki = 
y/\ det g |. The absolute value sign is needed because a relativistic 
metric has a negative determinant. 

Cartesian coordinates and their half ling versions Example 7 

Consider Euclidean coordinates in the plane, so that the metric is 
a 2 x 2 matrix, and e,y has only two indices. In standard Cartesian 
coordinates, the metric is g = diag(1,1), which has det g = 1. 
The Levi-Civita tensor therefore has e xy = +1 , and its other three 
components are uniquely determined from this one by the rules 
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discussed in example 6. (We could have flipped all the signs if we 
had wanted to choose the opposite orientation for the plane.) In 
matrix form, these rules result in 


e = 




Now transform to coordinates ( x',y ') = (2x, 2 y). In these coordi- 
nates, the metric is g’ = diag(1 /4, 1 /4), with detg = 1 / 16, so that 
e x / y / = 1 /4, or in matrix form, 


e 


/ 


( 0 1/4\ 

I" 1/4 0 ) 


Polar coordinates Example 8 

In polar coordinates (r, 0), the metric is g = diag(1 , r 2 ) (problem 
1, p. 160), which has determinant r 2 . The Levi-Civita tensor is 


(taking the same orientation as in example 7). 

Area of a circle Example 9 

Let’s find the area of the unit circle. Its (signed) area is 


A = 


2-volume(dr, d0), 


where the order of dr and d0 is chosen so that, with the orientation 
we’ve been using for the plane, the result will come out positive. 
Using the definition of the Levi-Civita tensor, we have 


A = J e r edx r dx e 

r 1 /*27T 


rdrdd 


I r=0 Je = 0 


7T 


[example 8] 


7.6.3 The 3-volume covector 

Consider the volume of a three-dimensional subspace of four- 
dimensional spacetime. Linearity leads to an especially simple char- 
acterization of the 3-volume. Let a 3- volume be defined by the par- 
allelepiped spanned by vectors a, b, and c. If we threw in a fourth 
vector d, we would have a 4-volume, and 4-volume is a scalar. This 
4-volume would depend in a linear way on all four vectors, and in 
particular it would depend linearly on d. But this means we have 
a scalar that is a linear function of a vector, and such a function 
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is exactly what we mean by a covector. We can therefore define a 
volume covector S according to 

Sid 1 = 4-volume(a, b, c, d) 


or 


Si = €ijkia l W c k . [tensorial e] 


The volume covector collects the information about the volume of 
the 3-parallelepiped, encapsulating it in a convenient form with 
known transformation properties. In particular, the statement and 
proof of Gauss’s theorem in 3 + 1 dimensions are greatly simpli- 
fied by the use of this tool (p. 190). The 3-volume covector, un- 
like the affine 3-volume, is defined in an absolute sense rather than 
in relation to some parallelepiped arbitrarily chosen as a standard. 
Both the covector and the affine volume fail to satisfy the ratio- 
comparison property VI on p. 151, since we can’t compare volumes 
unless they lie in parallel 3-planes. 


We’ve been visualizing covectors in n dimensions as stacks of 
( n — l)-dimensional planes (figure a/2, p. 129; figure d/2, p. 136). 
The volume three-vector should therefore be visualized as a stack 
of 3-planes in a four-dimensional space. Since most of us can’t vi- 
sualize things very well in four dimensions, figure j omits one of the 
dimensions, so that the 3-surfaces appear as two-dimensional planes. 
The small hand j/1 has a certain 3- volume, and the covector that 
measures it is represented by the stack of 3-planes parallel to it, 
j/2. The bigger hand j/3 has twice the 3- volume, and its covector 
is represented by a stack of planes with half the spacing. 

If we step down from four dimensions to three, then the volume 
covector formed by vectors u and v becomes the vector cross product 
S = u x v, i.e., Sk = €ijkU l v J . 

A vector cross product Example 1 0 

Consider Euclidean 3-space in Cartesian coordinates. We know 
from freshman physics that 


z = x x y. 


Reexpressing this in the notation above, we have u x = 1, v y = 1, 
and zero for all the other components of u and v. Since the 
Levi-Civita tensor vanishes if we have any duplicated indices, 
its only nonvanishing component that can be relevant here is 
e xyz = 1- (Here we assume the standard right-handed orienta- 
tion for Cartesian coordinates, and we make use of the fact that 
g = diag(1, 1, 1), so that detg = 1.) The result is 

S z = e xyz u x v x = 1, 

as expected. (It doesn’t matter here whether we talk about S z or 
S z , because with this metric, raising and lowering indices doesn’t 
change the components of a vector.) 



j / Interpretation of the 3-volume 
covector. 
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Classification of 3-surfaces 


A useful application of the 3-volume covector is in classifying 
3-surfaces by how they relate to the light cone. If I nail together 
three sticks, all at right angles to one another, then I can consider 
them as a set of basis vectors spanning a three-dimensional space 
of events. This three-space is flat, so we can call it a hyperplane 
- or just a plane if, as throughout this section, there is no danger 
of forgetting that it has three dimensions rather than two. All of 
the events in this plane are simultaneous in my frame of reference. 
None of these facts depends on the use of right angles; we just need 
to make sure that the sticks don’t all lie in the same plane. 

The business of a physicist is ultimately to make predictions. 
That is, if given a set of initial conditions, we can say how our sys- 
tem will evolve through time. These initial conditions are in prin- 
ciple measured throughout all of space, and a plane of simultaneity 
would be a natural choice for the set of points at which to take the 
measurements. A surface used for this purpose is called a Cauchy 
surface. 

If a plane is a surface of simultaneity according to some observer, 
then we call it spacelike. Any particle’s world-line must intersect 
such a plane exactly once, and this is why it works as a Cauchy 
surface: we are guaranteed to detect the particle, so that we can 
account for its effect on the evolution of the cosmos. We could take 
a spacelike plane and reorient it. For a small enough change in 
the orientation (that is, a change that could be described by small 
enough changes in the basis vectors), it will remain spacelike. 

When a plane is not spacelike, and remains so under any suffi- 
ciently small change in orientation, we call it timelike. In Minkowski 
coordinates, an example would be the t.-x-y plane. A given particle’s 
world-line might never cross such a surface, and therefore a timelike 
plane cannot be used as a Cauchy surface. 

A plane that is neither spacelike nor timelike is called light- 
like. An example is the surface defined by the equation x = t in 
Minkowski coordinates. 

The above classification can be stated very succinctly by using 
the 3- volume covector defined in section 7.6.1. A plane is space- 
like, lightlike, or timelike, respectively, if the regions it contains 
are described by 3- volume covectors that are timelike, lightlike, or 
spacelike. A surface that is smooth but not necessarily flat can 
be be described locally according to these categories by considering 
its tangent plane. For example, a light cone is lightlike at each of 
its points, and since it is lightlike everywhere, we call it a lightlike 
surface. The event horizon of a black hole is also a lightlike sur- 
face. Any spacelike surface, whether curved or flat, can be used as 
a Cauchy surface. 


158 


Chapter 7 


Coordinates 



Lightlike surfaces have some funny properties. Using birdtracks 
notation, suppose that we form such a surface as the space spanned 
by the three basis vectors -*-a, -*T>, and -*-c, and let S-*- be the 
corresponding 3-volume covector. The surface is lightlike, so 

S-S = 0. (3) 

Because S-*- is defined as the function giving the 4-volume of a 
parallelepiped spanned by the bases with a fourth vector *-d, and 
because this volume vanishes when *-d is tangent to the surface 
(property V2, p. 151), we have, 

S-a = S-b = S «*c = 0. (4) 

So in this sense S *- is perpendicular to the surface. In Euclidean 
space we are used to describing the orientation of a surface in terms 
of the unit normal vector, and this is very nearly what S-*- is, except 
that it’s a covector rather than a vector, and it also can’t be made to 
have unit length, since its magnitude is zero. We could fix the first 
of these two problems by constructing the vector -*-S that is dual 
to S but this has a disconcerting effect. Combining (3) with the 
definition of S-*-, we find that -*-S spans a vanishing 4-volume with 
the basis vectors, and therefore by V2 we find that -*-S is tangent 
to the surface. Thus in some sense we have a vector that is both 
parallel to and tangent to a surface — which avoids being absurd 
because we are really referring to two different objects, the covector 
S-* and the vector *-S. 
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Problem 2. 


Problems 

1 Example 3 on p. 149 discussed polar coordinates in the Eu- 
clidean plane. Use the technique demonstrated in section 7.3 to find 
the metric in these coordinates. 

2 Oblique Cartesian coordinates are like normal Cartesian co- 
ordinates in the plane, but their axes are at at an angle tp ^ 7r/2 to 
one another. Show that the metric in these coordinates is 

ds 2 = dx 2 + d y 2 + 2 cos <p dx Ay. 


3 Let a 3-plane U be defined in Minkowski coordinates by the 
equation x = t. Is this plane spacelike, timelike, or lightlike? Find 
a covector S - *- that is normal to U in the sense described on p. 158, 
describing it in terms of its components. Compute the vector -»-S, 
also in component form. Verify that S-*-S = 0. Show that -*-S is 
tangent to M. 

4 For the oblique Cartesian coordinates defined in problem 
2, use the determinant of the metric to show that the Levi-Civita 
tensor is 

_ ( 0 sin ip 

y— sin ip 0 

5 Use the technique demonstrated in example 9, p. 156, to find 
the volume of the unit sphere. 
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Chapter 8 

Rotation (optional) 

8.1 Rotating frames of reference 

8.1.1 No clock synchronization 

Panels 1 and 2 of figure a recapitulate the result of example 16 
on p. 34. The set of three clocks fixed to the earth in a/1 have 
been synchronized by Einstein synchronization (example 4, p. 18), 
i.e., by exchanging flashes of light. The three clocks aboard the 
moving train, a/2, have been synchronized in the same way, and 
the events that were simultaneous according to frame 1 are not 
simultaneous in frame 2. There is a systematic shift in the times, 
which is represented by the term t' = ... — v'yx in the Lorentz 
transformation (eq. (1), p. 31). 


a /Clocks can’t be synchronized 
in a rotating frame of reference. 




Now suppose we take the diagram of the train and wrap it 
around, a/3. If we go on and close the loop, making the chain 
into a circle like a chain necklace, we have a problem. The trend 
in the clock times can continue until it wraps back around to the 
beginning, but then there will be a discrepancy. 
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We conclude that clocks can’t be synchronized in a rotating 
frame of reference. Such a frame does not admit a universal time 
coordinate because Einstein synchronization isn’t transitive: syn- 
chronizing clock A with clock B, and B with C, does not imply that 
A is synchronized with C. This nontransitivity is one way of defining 
what we mean by rotation. That is, if the operational definition of 
an inertial frame given in section 5.1, p. 117, shows that our frame is 
noninertial, and we want to know more about why it’s noninertial, 
testing for this nontransitivity is a way of finding out whether it’s 
because of rotation. 

8.1.2 Rotation is locally detectable 

The people aboard the circular train know that their attempts at 
synchronization fail, so they can tell, without reference to anything 
external, that they’re going in a circle. (Cf. example 1, p. 118.) 

Although this is a book on special, not general, relativity, it’s in- 
teresting to note the following possibility. Suppose that we verify, by 
local experiments, that we have a good, nonrotating, inertial frame 
of reference. It is then imaginable that if we view distant galaxies 
from this frame, we will see them rotate at some angular frequency 
12 about some axis on the celestial sphere. If this is observed, then 
we must infer that it is the universe as a whole — not our labora- 
tory! — that is rotating. Such an effect has been searched for, and, 
for example, an upper limit 12 < 10 - ' radian/year was inferred by 
Clemence. 1 General-relativistic models of such rotating cosmologies 
have a preferred vector constituting the direction of the axis about 
which matter rotates, but there is no global center of rotation. Cur- 
rent upper limits on 12 are good enough to rule out any significant 
effect on cosmological expansion due to centrifugal forces. 

8.1.3 The Sagnac effect 

Although the train scenario is obviously unrealistic, the time 
shift is far from hypothetical. This type of effect, called the Sagnac 
effect, was first observed by M. Georges Sagnac in 1913, and it 
relates to the principle of the ring laser gyroscope (example 2, p. 18), 
used in passenger jets. (The name is French, and is pronounced 
“sah-NYAHK.”) To find the Sagnac effect quantitatively, we note 
that in the circular train example (ignoring signs) the relevant term 
in the Lorentz transformation, vyx, would accumulate, after one 
complete circuit of Einstein synchronization, a discrepancy 5 equal 
to the circumference of the circle multiplied by v'y. If the circle’s 
radius is r and the angular velocity w, we have At = 2n'yr‘ 2 uj. This 
can be rewritten in terms of the circle’s area A as At = 2 Auj, or, 
reinserting factors of c to accomodate SI units, At = 2 Au/c 2 . The 
proportionality to the enclosed area is not an accident; the product 
vx has the form of the integrand F • ds occurring in Stokes’ theorem. 


1 “Astronomical time,” Rev. Mod. Phys. 29 (1957) 2. 
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Sagnac effect in the Hafele-Keating experiment Example 1 
A clock at the equator of the earth rotates at a frequency tu of 
2n radians per sidereal day, suffering a Sagnac effect of 210 ns 
per day. The traveling atomic clocks in the Hafele-Keating exper- 
iment (p. 15) went around the world in both directions, and were 
compared with a third set of clocks that stayed in Washington, DC. 
Since the time required to fly around the earth was also on the or- 
der of one day, the differences in the values of cu for the three sets 
of clocks were on the same order of magnitude as the tu of the 
earth, and we therefore expect cumulative differential Sagnac ef- 
fects that are also on the order of a hundred nanoseconds. These 
effects exist only in the rotating frame of the earth, but the things 
being measured are proper times, and proper time is a scalar, so 
the experimental results are independent of what frame of refer- 
ence is used for calculating them. Since the airline pilots provided 
Hafele and Keating with navigational data referred to the rotating 
earth, they analyzed their results in the rotating frame, in which 
there was a Sagnac effect. They could equally well have trans- 
formed their data into the frame of the stars, in which case the 
same result would have been predicted, but it would have been 
described as arising from kinematic time dilation. 

Ring laser gyroscope Example 2 

The ring laser gyroscope in the photo in example 2 on p. 18 looks 
like it has an area on the order of 10 2 cm 2 and uses red light. 
For use in navigation, one wants to be able to detect a change in 
course of, say, one degree in our hour, or cn ~ 5 x 10~ 6 radian/s. 
The result is a time shift At ~ 10~ 24 s, which for red light is a 
phase shift of only Acf> = 47t/4tu/cA ~ 3 x 1 CH 9 radian. In the orig- 
inal nineteenth-century experiments, this phase shift would have 
had to be measured by producing interference between the two 
beams and measuring the change in intensity resulting from this 
change in phase. Our estimate of 4> shows that this is impractical 
for a portable instrument. In a modern ring laser gyroscope, an 
active laser medium is inserted in the loop, and the result is that 
the loop resonates at a frequency that is shifted from the laser’s 
natural frequency by A f ~ A cpc/Z_, where L is the circumference. 
The result is a frequency shift of a few Hz, which is easily measur- 
able. An alternative technique, used in the fiber optic gyroscope, 
is to wrap N turns of optical fiber around the circumference, ef- 
fectively changing A to NA. 

8.1.4 A rotating coordinate system 

The GPS system is a practical example of a case where we nat- 
urally want to employ a rotating coordinate system. Hikers and 
sailors, after all, want to know where they are relative to the earth’s 
rotating surface. Since locations need to be determined to within 
meters, the timing of signals needs to be done to a precision of 
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something like (1 m)/c, which is a few nanoseconds. This is why 
the GPS satellites have atomic clocks aboard, and timing to this pre- 
cision clearly requires that relativistic effects be taken into account. 
We therefore need not a rotating Newtonian coordinate system but 
a rotating relativistic one. Let’s start with the nonrotating frame, 
and define coordinates (i, r, 0, z ), with the spatial part (r, 9, z) being 
ordinary cylindrical coordinates. For simplicity, we’ll neglect the z 
coordinate in what follows. Extending the result of problem 1 on 
p. 160 from 2 + 0 dimensions to 2 + 1, we have the metric 

ds 2 = dt 2 — d?’ 2 — r 2 d9 2 . (1) 

The results of section 8.1.1 show that we do not expect to be able 
to define a completely satisfactory time coordinate in the rotating 
frame, so let’s start with the minimal change ( t,r,0 ) — > ( t,r,9 '), 
where O' = 9 — cot. This is at least enough to make world-lines of 
constant 9' be ones that revolve around the origin at the appropriate 
frequency. Substituting d# = d 9' + co dt, we find 

ds 2 = (1 - w 2 r 2 ) dt 2 - dr 2 - r 2 d 9' 2 - 2 wr 2 d O' dt. (2) 

Recognizing cor as the velocity of one frame relative to another, 
and (1 — <n 2 ?’ 2 ) -1 / 2 as 7, we see that we do have a relativistic time 
dilation effect in the dt 2 term. But the dr 2 and d O' 2 terms look the 
same as in equation (1). Why don’t we see any Lorentz contraction 
of the length scale in the azimuthal direction? 

The answer is that coordinates in relativity are arbitrary, and 
just because we can write down a certain set of coordinates, that 
doesn’t mean they have any special physical interpretation. The co- 
ordinates (t, r, O') do not correspond physically to the quantities that 
a rotating observer R would measure with clocks and meter-sticks. 
If R uses a ruler to measure a short arc along the circumference of 
the circle r = ro, the distance is a distance being measured between 
events in spacetime that are simultaneous in the rest frame of the 
ruler, and these do not occur at the time value of the time coordi- 
nate t. In the Lorentz transformation, for linear motion, it is the 
—v'yx term applied to the times that fixes this problems and makes t! 
properly represent simultaneity in the new frame. In our rotational 
version, we could try to do something similar by defining a time 
coordinate t' = t + f0', where / is a function of r that is engineered 
so that the d 9' dt cross term in the metric would go away. This can 
be done (the function / that works turns out to be cor 2 /(I — co 2 r 2 )), 
but the problem is that the t' coordinate is not single-valued, in the 
sense that (t, r, 9) and (t, r,9 + 27r) would not produce the same t! . 
This is inevitable, as we’ve seen in section 8.1.1, so we can’t improve 
on the coordinates ( t,r,0 ') and the metric (2). 

The coordinates ( t,r,9 '), with the metric (2) are the ones used 
in the GPS system, and in that context are called Earth-Centered 
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Inertial (ECI) coordinates. (Another name is Born coordinates.) 
Their time coordinate is not the time measured by a clock in the 
rotating frame but is simply the time coordinate of the nonrotating 
frame of reference tied to the earth’s center. Conceptually, we can 
imagine this time coordinate as one that is established by sending 
out an electromagnetic “tick-tock” signal from the earth’s center, 
with each satellite correcting the phase of the signal based on the 
propagation time inferred from its own r. In reality, this is accom- 
plished by communication with a master control station in Colorado 
Springs, which communicates with the satellites via relays at Kwa- 
jalein, Ascension Island, Diego Garcia, and Cape Canaveral. 

8.2 Angular momentum 

Nonrelativistically, the angular momentum of a particle with mo- 
mentum p, at a position r relative to some arbitrarily fixed point, 
is L = r x p. When we generalize this equation to relativity, we run 
into a number of issues. Issues due to special relativity: 

1. The vector cross product only makes sense in three dimensions, 
so it is not well defined in special relativity (sec. 7.6.2, p. 153). 

2. Assuming we get around issue number 1, how do we know that 
this quantity is conserved? 

And from general relativity: 

3. In general relativity, only infinitesimally small spatial or space- 
time displacements dr can be treated as vectors. Larger ones 
cannot. This is because spacetime can be curved, and vectors 
can’t be used to define displacements on a curved space (e.g., 
the surface of the earth). 

4. If space has a nontrivial topology, then we may not be able to 
define an orientation (sec. 7.6.2, p. 153). 

For points 3 and 4, we refer to Hawking and Ellis, p. 62. Number 
2 is addressed in sec. 9.3.5, p. 193. For number 2 we will need 
the stress-energy tensor, which will be described in ch. 9. Lest you 
feel totally cheated, we will resolve issue number 1 in section 8.2.2, 
p. 167, but before we do that, let’s consider an interesting example 
that can be handled with simpler math. 

8.2.1 The relativistic Bohr model 

If we want to see an interesting real-world example of relativistic 
angular momentum, we need something that rotates at relativistic 
velocities. At large scales we have astrophysical examples such as 
neutron stars and the accretion disks of black holes, but these in- 
volve gravity and would therefore require general relativity. At mi- 
croscopic scales we have systems such as hadrons, nuclei, atoms, and 
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molecules. These are quantum-mechanical, and relativistic quantum 
mechanics is a difficult topic that is beyond the scope of this book, 
but we can sidestep that issue by using the Bohr model of the atom. 
In the Bohr model of hydrogen, we assume that the electron has a 
circular orbit governed by Newton’s laws, does not radiate, and has 
its angular momentum quantized in units of h. Let’s generalize the 
Bohr model by applying relativity. 

It will be convenient to define the constant a = ke 2 /h , known as 
the fine structure constant, where k is the Coulomb constant and e is 
the fundamental charge. The fine structure constant is unitless and 
is approximately 1/137. It is essentially a measure of the strength 
of the electromagnetic interaction, and in the Bohr model it also 
turns out to be the velocity of the electron (in units of c) in the 
ground state of hydrogen. Because this velocity is small compared 
to 1 , we expect relativistic corrections in hydrogen to be small - 
of relative size a 2 . But we have an interesting opportunity to get 
at some additional and more exciting physics if we consider a hy- 
drogenlike atom, i.e., an ion with Z protons in the nucleus and only 
one electron. Raising Z cranks up the energy scale and therefore 
increases the velocity as well. 

Combining the Coulomb force law with the result of example 
13, p. 101, for uniform circular motion, we have kZe 2 /r = myu 2 , 
where the factor of 7 is the relativistic correction. The electron’s 
momentum is perpendicular to the radius vector, so we assume for 
the moment that (as turns out to be true), the angular momentum 
is given by L = rp = muqr, where again a relativistic correction 
factor of 7 appears. This is quantized, so let L = £h , where t is an 
integer. Solving these equations gives 


Za 



_ £ 2 h 
mZa'y 

These differ from the nonrelativistic versions only by the factor of 
7 in the second equation. The electrical energy is U = — kZe 2 /r , 
and the kinetic energy K = 777,(7 ~ 1 ) (with c = 1). We will find 
it convenient to work with the (positive) binding energy in units of 
the mass of the electron. Call this quantity £. After some algebra, 
the result is 

£ = 1- \/\-v 2 . 

Surprisingly, this is also the exact result given by relativistic quan- 
tum mechanics if we solve the Dirac equation for the ground state, 
or if we take a high-energy (nearly unbound) state with the maxi- 
mum value of l, as is appropriate for a semiclassical circular orbit. 
So we can see that even though our quantum mechanics was crude, 
our relativity makes some sense and gives reasonable results. For 
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small Z, a Taylor series approximation gives £ = v 2 /2 + v 4 /8 + . . ., 
where the fourth-order term represents the relativistic correction. 

So far so good, but now what if we crank up the value of Z to 
make the relativistic effects strong? A very disturbing thing happens 
when we make Z > 137 « 1/a. In the ground state we get v > 1 
and a complex number for £. Clearly something has broken down, 
and our results no longer make sense. We might be inclined to 
dismiss this as a consequence of our crude model, but remember, our 
calculations happened to give the same result as the Dirac equation, 
which has real relativistic quantum mechanics baked in. We should 
take this breakdown as evidence of a real physical breakdown. The 
interpretation is as follows. 

According to quantum mechanics, the vacuum isn’t really a vac- 
uum. Particle-antiparticle pairs are continually popping into ex- 
istence in empty space and then reannihilating one another. Their 
temporary creation is a violation of the conservation of mass-energy, 
but only a temporary violation, and this is allowed by the time- 
energy form of the Heisenberg uncertainty principle, AEAt > h, as 
long as At is short. It’s as though we steal some money, but the 
police don’t catch us as long as we put it back before anyone can 
notice. Because these particles are only temporarily in our universe, 
we call them virtual particles, as opposed to real particles that have 
a potentially permanent existence and can be detected as blips on 
a Geiger counter. 

But when the vacuum contains an electric field that is beyond a 
certain critical strength, it becomes possible to create an electron- 
antielectron pair, let the opposite charges separate and release en- 
ergy, and pay off the energy debt without having to reannihilate 
the particles. This is known as “sparking the vacuum.” As of this 
writing, only nuclei with Z up to about 118 have been discovered, 
and in any case the critical Z value of 1/a ~ 137 was only a rough 
estimate. But by colliding heavy nuclei such as lead, one can at 
least temporarily form an unstable compound system with a high 
Z, and attempts are being made to search for the predicted effect 
in the laboratory. 

8.2.2 The angular momentum tensor 

As mentioned previously, there is no such thing as a vector cross 
product in four dimensions, so the nonrelativistic definition of an- 
gular momentum as L = r x p needs to be modified to be usable in 
relativity. 

Given a position vector r a and a momentum vector we ex- 
pect based both on units and the correspondence principle that a 
relativistic definition of angular momentum must be some kind of 
a product of the vectors. Based on the rules of index notation, we 
don’t have much leeway here. The only products we can form are 
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r a p b , which is a rank-2 tensor, or r a p a , a scalar. Since nonrelativistic 
angular momentum is a three-vector, the correspondence principle 
tells us that its relativistic incarnation can’t be a scalar — there 
simply wouldn’t be enough information in a scalar to tell us the 
things that the nonrelativistic angular momentum vector tells us: 
what axis the rotation is about, and which direction the rotation is. 

The tensor r a p b also has a problem, but one that can be fixed. 
Suppose that in a certain frame of reference a particle of mass m/0 
is at rest at the origin. Then its position four-vector at time t is 
(t, 0,0,0), and its energy- momentum vector is (m, 0,0,0). These 
vectors are parallel. The tensor r a p b is nonzero and nonconserved as 
time flows, but clearly we want the angular momentum of an isolated 
particle to be conserved. Another example would be if, at a certain 
moment in time, we had r = (0,x,0,0) and p = (E,p, 0,0), with 
both x and p positive. This particle’s motion is directly away from 
the origin, so its angular momentum should be zero by symmetry, 
but r a p b is again nonzero. 

The way to fix the problem is to force the product of the position 
and momentum vectors to be an antisymmetric tensor: 

L ab = r a p b - r b p a . 

Antisymmetric means that L ab = —L ba , so that elements on opposite 
sides of the main diagonal are the same except for opposite signs. A 
quick check shows that this gives the expected zero result in both of 
the above examples. A component such as L yz measures the amount 
of rotation in the y-z plane. In a nonrelativistic context, we would 
have described this as an x component L x of the angular momentum 
three- vector, because a rotation of the y-z plane about the origin is 
a rotation about the x axis — such a rotation keeps the x axis fixed. 
But in four-dimensional spacetime, a rotation in the y-z plane keeps 
the entire t-x plane fixed, so the notion of rotation “about an axis” 
breaks down. (Notice the pattern: in two dimensions we rotate 
about a point, in three dimensions rotation is about a line, and in 
four dimensions we rotate about a fixed plane.) In sec. 9.3.5, p. 193, 
we show that L ab is conserved. 

If we lay the angular momentum tensor out in matrix format, it 
looks like this: 

/0 L tx L ty L tz \ 

0 L xy L xz 

0 L yz ' 

V 0 

The zeroes on the main diagonal are due to the antisymmetrization 
in the definition. I’ve left blanks below the main diagonal because 
although those components can be nonzero, they only contain a 
(negated) copy of the information given by the ones above the diag- 
onal. We can see that there are really only 6 pieces of information 
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in this 4x4 matrix, and we’ve already physically interpreted the 
triangular cluster of three space-space components on the bottom 
right. 

Why do we have the row on the top, consisting of the time- 
space components, and what do they mean physically? A highbrow 
answer would be that this is something very deep having to do 
with the fact that, as described in section 8.3 below, rotation and 
linear motion are not as cleanly separated in relativity as they are 
in nonrelativistic physics. A more straightforward answer is that in 
most situations these components are actually not very interesting. 
Consider a cloud of particles labeled i = 1 through n. Then for a 
representative component from the top row we have the total value 

L tx = j2up? -J2 x * E >- 

Now suppose that we fix a certain surface of simultaneity at time t. 
The sum becomes 


L tx = tJ2Pi- X>^- 

There is information here, but it’s not exciting information about 
angular momentum, it’s boring information about the position and 
motion of the system’s center of mass. If we fix a frame of reference 
in which the total momentum is zero, i.e. , the center of mass frame, 
then we have ^2pf = 0. Let’s also define the position of the center of 
mass as the average position weighted by mass-energy, rather than 
the mass- weighted average, as we would do in Newtonian mechanics. 
Then the sum ^ X{Ei is a constant relating to the position of the 
center of mass, and if we like we can make it equal zero by choosing 
the origin of our spatial coordinates to coincide with the center of 
mass. 

With these choices we have a much simpler angular momentum 
tensor: 

/0 0 0 0 \ 

0 L xy L xz 

o Ly z ' 

V o 


If we wish, we can sprinkle some notational sugar on top of all 
of this using the Levi-Civita tensor e described in optional section 
7.6, p. 151. Let’s define a new tensor *L according to 


L ij — 2 ^ijklL 


kl 


Then for an observer with velocity vector o y , the quantity o^*L yu 
has the form (0 , L yz , L zx , L xy ) (problem 3, p. 174). That is, its spa- 
tial components are exactly the quantities we would have expected 
for the nonrelativistic angular momentum three-vector (using the 
correct relativistic momentum). 
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8.3 Boosts and rotations 


A relative of mine fell in love. She and her boyfriend bought a house 
in the suburbs and had a baby. They think they’ll get married at 
some later point. An engineer by training, she says she doesn’t want 
to get hung up on the “order of operations.” For some mathematical 
operations, the order doesn’t matter: 5 + 7 is the same as 7 + 5. 


b / Performing the rotations in one 
order gives one result, 3, while re- 
versing the order gives a different 
result, 5. 



degrees about y. 


(5) Rotated about 
y, then x. 


8.3.1 Rotations 

But figure b shows that the order of operations does matter 
for rotations. Rotating around the x axis and then y produces a 
different result than y followed by x. We say that rotations are 
noncommutative. This is why, in Newtonian mechanics, we don’t 
have an angular displacement vector AO ; vectors are supposed to be 
additive, and vector addition is commutative. For small rotations, 
however, the discrepancy caused by choosing one order of operations 
rather than the other becomes small (of order 9 2 ), so we can define 
an infinitesimal displacement vector d 6, whose direction is given by 
the right-hand rule, and an angular velocity u> = dO/ dt. 

As an example of how this works out for small rotations, let’s 
take the vector 

(0,0,1) (3) 

and apply the operations shown in figure b, but with rotations of 
only 9 = 0.1 radians rather than 90 degrees. Rotation by this 
angle about the x axis is given by the transformation (x, y, z) — > 
(x, y cos 9 — z sin 9, y sin 9 + z cos 9), and applying this to the original 
vector gives this: 

(0.00000, -0.09983, 0.99500) (after x) (4) 


170 


Chapter 8 Rotation (optional) 


After a further rotation by the same angle, this time about the y 
axis, we have 

(0.09933, -0.09983, 0.99003) (after x, then y) (5) 

Starting over from the original vector (3) and doing the operations 
in the opposite order gives these results: 

(0.09983,0.00000,0.99500) (after?/) (6) 

(0.09983, -0.09933, 0.99003) (after y, then x) (7) 

The discrepancy between (5) and (7) is a rotation by very nearly 
.005 radians in the xy plane. As claimed, this is on the order of 0 2 
(in fact, it’s almost exactly 0 2 / 2). A single example can never prove 
anything, but this is an example of the general rule that rotations 
along different axes don’t commute, and for small angles the dis- 
crepancy is a rotation in the plane defined by the two axes, with a 
magnitude whose maximum size is on the order of 9 2 . 

8.3.2 Boosts 

Something similar happens for boosts. In 3 + 1 dimensions, we 
start with the vector 

( 0 , 1 , 0 , 0 ), ( 8 ) 

pointing along the x axis. A Lorentz boost with v = 0.1 (eq. (1), 
p. 31) in the x direction gives 

(0.10050, 1.00504, 0.00000, 0.00000) (after x) (9) 

and a second boost, now in the y direction, produces this: 

(0.10101, 1.00504, 0.01010, 0.00000) (after x, then y) (10) 

Starting over from (8) and doing the boosts in the opposite order, 

we have 

(0.00000, 1.00000, 0.00000, 0.00000) (after y ) (11) 

(0.10050, 1.00504, 0.00000, 0.00000) (after y, then x) (12) 

The discrepancy between (10) and (12) is a rotation in the xy plane 
by very nearly 0.01 radians. This is an example of a more general 
fact, which is that boosts along different axes don’t commute, and 
for small angles the discrepancy is a rotation in the plane defined 
by the two boosts, with a magnitude whose maximum size is on the 
order of v 2 , in units of radians. 

8.3.3 Thomas precession 

Figure c shows the most important physical consequence of all 
this. The gyroscope is sent around the perimeter of a square, with 
impulses provided by hammer taps at the corners. Each impulse 
can be modeled as a Lorentz boost, notated, e.g., L x for a boost 
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c — |) L y 


c / Nonrelativistically, the gy- 
roscope should not rotate as long 
as the forces from the hammer 
are all transmitted to it at its 
center of mass. 


in the x direction. The series of four operations can be written as 
LyL x L_ y L_ x , using the notational convention that the first opera- 
tion applied is the one on the right side of the list. If boosts were 
commutative, we could swap the two operations in the middle of 
the list, giving L y L_ y L x L_ x . The L x would undo the L_ x , and 
the L y would undo the L_. y . But boosts aren’t commutative, so 
the vector representing the orientation of the gyroscope is rotated 
in the xy plane. This effect is called the Thomas precession, after 
Llewellyn Thomas (1903-1992). Thomas precession is a purely rela- 
tivistic effect, since a Newtonian gyroscope does not change its axis 
of rotation unless subjected to a torque; if the boosts are accom- 
plished by forces that act at the gyroscope’s center, then there is no 
nonrelativistic explanation for the effect. 

Clearly we should see the same effect if the jerky motion in figure 
c was replaced by uniform circular motion, and something similar 
should happen in any case in which a spinning object experiences an 
external force. In the limit of low velocities, the general expression 
for the angular velocity of the precession is fl = a x v, and in the 
case of circular motion, fl = (1/2 )v 2 uj, where u is the frequency of 
the circular motion. 

If we want to see this precession effect in real life, we should 
look for a system in which both v and a are large. An atom is 
such a system. The Bohr model, introduced in 1913, marked the 
first quantitatively successful, if conceptually muddled, description 
of the atomic energy levels of hydrogen. Continuing to take c = 1, 
the over-all scale of the energies was calculated to be proportional 
to mo ? , where m is the mass of the electron, and a is the fine struc- 
ture constant, defined earlier. At higher resolution, each excited 
energy level is found to be split into several sub-levels. The tran- 
sitions among these close-lying states are in the millimeter region 
of the microwave spectrum. The energy scale of this fine structure 
is ~ ma 4 . This is down by a factor of a 2 compared to the visible- 
light transitions, hence the name of the constant. Uhlenbeck and 
Goudsmit showed in 1926 that a splitting on this order of magni- 
tude was to be expected due to the magnetic interaction between 
the proton and the electron’s magnetic moment, oriented along its 
spin. The effect they calculated, however, was too big by a factor 
of two. 

The explanation of the mysterious factor of two had in fact been 
implicit in a 1916 calculation by Willem de Sitter, one of the first 
applications of general relativity. De Sitter treated the earth-moon 
system as a gyroscope, and found the precession of its axis of rota- 
tion, which was partly due to the curvature of spacetime and partly 
due to the type of rotation described earlier in this section. The 
effect on the motion of the moon was noncumulative, and was only 
about one meter, which was much too small to be measured at the 
time. In 1927, however, Thomas applied similar reasoning to the 
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hydrogen atom, with the electron’s spin vector playing the role of 
gyroscope. Since the electron’s spin is h/ 2, the energy splitting is 
±(h/2)fi, depending on whether the electron’s spin is in the same 
direction as its orbital motion, or in the opposite direction. This is 
less than the atom’s gross energy scale Hu by a factor of v 2 /2, which 
is ~ a 2 . The Thomas precession cancels out half of the magnetic 
effect, bringing theory in agreement with experiment. 

Uhlenbeck later recalled: “...when I first heard about [the Thomas 
precession], it seemed unbelievable that a relativistic effect could 
give a factor of 2 instead of something of order v/c... Even the 
cognoscenti of relativity theory (Einstein included!) were quite sur- 
prised.” 


<D 
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U) 
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- -J- J=l, s= + l/2 

- j - r s =- i /2 
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ground state 


d/ States in hydrogen are la- 
beled with their l and s quantum 
numbers, representing their 
orbital and spin angular momenta 
in units of h. The state with 
s = +1/2 has its spin angular 
momentum aligned with its orbital 
angular momentum, while the 
s = -1/2 state has the two 
angular momenta in opposite 
directions. The direction and 
order of magnitude of the splitting 
between the two £ = 1 states 
is successfully explained by 
magnetic interactions with the 
proton, but the calculated effect 
is too big by a factor of 2. The 
relativistic Thomas precession 
cancels out half of the effect. 
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Problems 


1 In the 1925 Michelson-Gale-Pearson experiment, the physicists 
measured the Sagnac effect due to the earth’s rotation. They laid 
out a rectangle of sewer pipes with length x = 613 m and width y = 
339 nr, and pumped out the air. The latitude of the site in Illinois 
was 41 °46', so that the effective area was equal to the projection of 
the rectangle into the plane perpendicular to the earth’s axis. Light 
was provided by a sodium discharge with A = 570 nm. The light 
was sent in both directions around the rectangle and interfered, 
effectively doubling the area. Clever techniques were required in 
order to calibrate the apparatus, since it was not possible to change 
its orientation. Calculate the number of wavelengths by which the 
relative phase of the two beams was expected to shift due to the 
Sagnac effect, and compare with the experimentally measured result 
of 0.230 ± 0.005 cycles. 

2 The relativistic heavy ion collider RHIC collides counter- 
rotating beams of gold nuclei at 9 GeV/nucleon. If a gold nucleus is 
approximately a sphere with radius 6 x 10“ 15 m, find the maximum 
angular momentum, in units of h, about the center of mass for a 
sides wiping collision. Answer: ~ 10 5 . 

3 Show, as claimed on p. 169, that the time-space components 
of the tensor *L equal the angular momentum three-vector. 
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Chapter 9 

Flux 

9.1 The current vector 

9.1 .1 Current as the flux of charged particles 

The most fundamental laws of physics are conservation laws, 
which tell us that we can’t create or destroy “stuff,” where “stuff” 
could mean quantities such as electric charge or energy-momentum. 
Since charge is a Lorentz invariant, it’s an easy example to start 
with. Because charge is invariant, we might also imagine that charge 
density p was invariant. But this is not the case, essentially because 
spatial (3-dimensional) volume isn’t invariant; in 3 + 1 dimensions, 
only four- dimensional volume is an invariant (problem 2, p. 51). For 
example, suppose we have an insulator in the shape of a cube, with 
charge distributed uniformly throughout it according to an observer 
oi at rest relative to the cube. Then in a frame 02 moving relative to 
the cube, parallel to one of its axes, the cube becomes foreshortened 
by length contraction, and its volume is reduced by the factor I/ 7 . 
The result is that the charge density in 02 is greater by a factor of 
7- 

This means that knowledge of the charge density p in one frame 
is insufficient to determine the charge density in another frame. In 
the example of the cube, what would be sufficient would be knowl- 
edge of the vector J = po v > where po is the charge density in the 
cube’s rest frame, and v is the cube’s velocity vector. J, called 
the current vector, transforms as a relativistic vector because of the 
transformation properties of the two factors that define it. The ve- 
locity v is a vector (section 3.5.1). The factor po is an invariant, 
since it in turn breaks down into charge divided by rest-volume. 
Charge is an invariant, and all observers agree on what the volume 
the cube would have in its rest frame. 

J can be expressed in Minkowski coordinates as (p, J x , J y , J z ), 
where p is the charge density and, e.g., J x is the density of electric 
current in the x direction. Suppose we define the three-surface S 
shown in figure a/ 1 , consisting of the set of events with coordinates 
(t, 0, y, z) such that 0 < t < 1, 0 < y < 1, and 0 < z < 1. Some 
charged particles have world-lines that intersect this surface, pass- 
ing through it either in the positive x direction or the negative x 
direction (which we count as negative charge transport). S has a 
three- volume V. If we add up the total charge transport A q across 



a / Charged particles with 
world-lines that contribute to J x 
and p. The z dimension isn’t 
shown, so the cubical 3-surfaces 
appear as squares. 
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this surface and divide by V, we get the average value of J x . If we 
let S shrink down to smaller and smaller three-surfaces surround- 
ing the event (0,0, 0,0), then we get the the value of J x at this 
point, limy^o Ag/V. In other words, J x measures the flux density 
of charge that passes through S. Of course this description in terms 
of a limit implies a large number of charges, not just one as in figure 
a. 


You can write out the analogous definition for J', using a surface 
of simultaneity for like S', figure a/2, and you’ll see that it expresses 
the density of charge p. In this case S' represents a moment in time, 
and the flux through S' means that the charges are crossing the 
threshold from the past into the future. 


Our argument that J transformed like a vector was based on a 
case where all the charged particles had the same velocity vector, but 
the above description in terms of the flux of charge eliminated any 
discussion of velocity. It’s true, but less obvious, that the J described 
in this way also transforms as a vector, even in cases where the 
charged particles do not all have parallel world-lines. The current 
vector is the source of electric and magnetic fields. Remarkably, no 
macroscopic electrical measurement is capable of detecting anything 
more detailed about the motion of the charges than the averaged 
information provided by J. 



b / Example 1. 


Boosting a solenoid Example 1 

The figure shows a solenoid, at rest, wound from copper wire. 
At point P, we construct a rectangular Amperian loop in the yz 
plane that has its right edge inside the solenoid and its left one 
outside. Ampere’s law, f B • ds = {4nk/c 2 )l, then tells us that the 
current density J x causes a difference between the exterior field 
B z = 0 and the interior field B z = ( 4nk / c 2 )J x Ay , where Ay is the 
thickness of the solenoid. There are two things we can get from 
this result, both of them nontrivial. 

First, the field depends only on the current density, not on any 
information about the details of the motion of the electrons in the 
copper. The electrons’ motion is fast and highly random, but all 
that contributes to J x is the slow drift velocity, typically ~ 1 cm/s, 
superimposed on the randomness. This is exact and not at all 
obvious. For example, the total momentum of the electrons does 
depend on the random part of their motion, because p x = myv x 
has a factor of y in it. 


Second, we can use the transformation properties of the current 
vector to find the field of this solenoid in a frame boosted along 
its axis. This is the kind of situation that would naturally arise, 
for example, in an electric motor whose rotor contains an elec- 
tromagnet. A Lorentz transformation in the z direction doesn’t 
change the x component of a vector, nor does it change Ay, so 
B z is the same in both frames. This is nontrivial both in the sense 
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that it would have been difficult to figure out by brute force and in 
the sense that fields don’t have to be the same in different frames 
of reference — for example, a boost in the x or the y direction 
would have changed the result. 

A wire Example 2 

In a solid conductor such as a copper wire, we have two types of 
charges, protons and electrons. The protons are at rest in the lab 
frame o, with charge density p p and current density 


Jp — (pp> 0, 0, 0) 


in Minkowski coordinates. The motion of the electrons is compli- 
cated. Some electrons are bound to a particular atom, but still 
move at relativistic speeds within their atoms. Others exhibit vio- 
lent thermal motion that very nearly, but not quite, averages out 
to zero when there is a current measurable by an ammeter. For 
simplicity, we treat all the electrons (both the bound ones and the 
mobile ones) as a single density of charge p e . Let the average 
velocity of the electrons, known as their drift velocity, be v in the x 
direction. Then in the frame o' moving along with the drift velocity 
we have 


J'e = (p'e, 0,0,0), 


which under a Lorentz transformation back into the lab frame be- 
comes 


Je = (p'eY, Pe^Y, 0,0). 

Adding the two current vectors, we have a total current in the lab 
frame 


J = (Pp + PeY. P>Y,0,0). 

The wire is electrically neutral in this frame, so p p +p' e y = 0. Since 
Pp is a fixed property of the wire, we express p' e in terms of it as 
Pp/y. Eliminating p' e gives 


J - (0, — ppV, 0, 0). 


Because the y factors canceled, we find that the current is exactly 
proportional to the drift velocity. Geometrically, we have added 
two timelike vectors and gotten a spacelike one; this is possible 
because one of the timelike vectors was future-directed and the 
other past-directed. 
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c/1. Charge is not conserved. 
Charges mysteriously appear at 
a later time without having been 
present before. 2. Charge is 
conserved. Although more world- 
lines come out through the top 
of the box than came in through 
the bottom, the discrepancy is 
accounted for by others that 
entered through the sides. 


9.1.2 Conservation of charge 

Conservation of charge can be expressed elegantly in terms of J. 
Charge density is the timelike component J t . If this charge density 
near a certain point is, for example, increasing, then it might be 
because charge conservation has been violated as in figure c/1. In 
this example, more world-lines emerge into the future at the top 
of the four-cube than had entered through the bottom in the past. 
Some process inside the cube is creating charge. In the limit where 
the cube is made very small, this would be measured by a value of 
dJ l / dt that was greater than zero. 

But experiments have never detected any violation of charge con- 
servation, so if more charge is emerging from the top (future) side of 
the cube than came in from the bottom (past), the more likely expla- 
nation is that the charges are not all at rest, as in c/1, but are mov- 
ing, c/2, and there has been a net flow in from neighboring regions 
of space. We should find this reflected in the spatial components J x , 
J y and J z . Moreover, if these spatial components were all constant, 
then any given region of space would have just as much current flow- 
ing into it from one side as there was flowing out the other. We there- 
fore need to have some nonzero partial derivatives such as dJ x /dx. 
For example, figure c/2 has a positive J x on the left and a negative 
J x on the right, so dJ x /dx < 0. Charge conservation is expressed by 
the simple equation dJ x /dx x = 0. Writing out the implied sum over 
A, this says that dJ t /dt + dJ x /dx + dJ y /dy + dJ z /dz = 0. with an 
implied sum over the index A. If you’ve taken vector calculus, you’ll 
recognize the operator being applied to J as a four-dimensional gen- 
eralization of the divergence. This charge-conservation equation is 
valid regardless of the coordinate system, so it can also be rewritten 
in abstract index notation as 


dJ a 

dx a 


(i) 


Conservation of charge in a solenoid Example 3 

In a solenoid, we have charge circulating at some drift velocity v. 
Ignoring the protons, and adapting the relevant expression from 
example 2 to the case of circular rather than linear motion, we 
might have for the electrons’ contribution to the current something 
of the form 


J = p(1 , -qy, qx, 0 ), 


where p = yv and q depends on the v and on the radius of the 
solenoid. Conservation of charge is satisfied, because each of 
the four terms in the equation dJ l / dt+dJ x / dx+dJ y / dy+dJ z / dz = 
0 vanishes individually. 
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9.2 The stress-energy tensor 

9.2.1 Conservation and flux of energy-momentum 

A particle such as an electron has a charge, but it also has a 
mass. We can’t define a relativistic mass flux because flux is de- 
fined by addition, but mass isn’t additive in relativity (example 6, 
p. 89). Mass-energy is additive, but unlike charge it isn’t an in- 
variant. Mass-energy is part of the energy- momentum four vector 
p = (E,p x ,p y ,p z ). We then have sixteen different fluxes we can 
define. For example, we could replay the description in section 9.1 
of the three-surface S perpendicular to the x direction, but now we 
would be interested in a quantity such as the z component of mo- 
mentum. We then have a measure of the density of flux of p z in 
the x direction, which we notate as T zx . The matrix T is called 
the stress-energy tensor, and it is an object of central importance 
in relativity. (The reason for the odd name will become more clear 
in a moment.) In general relativity, it is the source of gravitational 
fields. 

The stress-energy tensor is related to physical measurements as 
follows. Let o be the future-directed, normalized velocity vector of 
an observer; let s express a spatial direction according to this ob- 
server, i.e., it points in a direction of simultaneity and is normalized 
with s • s = —1; and let S be a three-volume covector (p. 156), di- 
rected toward the future (i.e., o a S a > 0). Then measurements by 
this observer come out as follows: 

T ab o a Sb = mass-energy inside the three-volume S (2a) 

T ab s a Sb = momentum in the direction s, inside S (2b) 

The stress-energy tensor allows us to express conservation of 
energy-momentum as 

d T ab 
dx a 

This local conservation of energy-momentum is all we get in general 
relativity. As discussed in section 4.3.6, p. 97, there is no such 
global law in curved spacetime. However, we will show in section 
9.3.4 that in the special case of flat spacetime, i.e., special relativity, 
we do have such a global conservation law. 

9.2.2 Symmetry of the stress-energy tensor 

The stress-energy tensor is a symmetric matrix. For example, 
let’s say we have some nonrelativistic particles. If we have a nonzero 
T tx , it represents a flux of mass-energy (j/) through a three-surface 
perpendicular to x. This means that mass is moving in the x direc- 
tion. But if mass is moving in the x direction, then we have some 
x momentum p x . Therefore we must also have a T xt , since this mo- 
mentum is carried by the particles, whose world-lines pass through 
a hypersurface of simultaneity. 
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9.2.3 Dust 


The simplest example of a stress-energy tensor would be a cloud 
of particles, all at rest in a certain frame of reference, described in 
Minkowski coordinates: 


r~pfllS 


(p 0 0 0\ 

0 0 0 0 
0 0 0 0 ’ 
\0 0 0 0 / 


where we now use p to indicate the density of mass-energy, not 
charge as in section 9.1. This could be the stress-energy tensor of 
a stack of oranges at the grocery store, the atoms in a hunk of 
copper, or the galaxies in some small neighborhood of the universe. 
Relativists refer to this type of matter, in which the velocities are 
negligible, as “dust.” The nonvanishing component T tt indicates 
that for a three-surface S perpendicular to the t axis, particles with 
mass-energy E = P f are crossing that surface from the past to the 
future. Conservation of energy-momentum is satisfied, since all the 
elements of this T are constant, so all the partial derivatives vanish. 


9.2.4 Rank-2 tensors and their transformation law 


Suppose we were to look at this cloud in a different frame of 
reference. Some or all of the timelike row T tu and timelike column 
T ,d would fill in because of the existence of momentum, but let’s 
just focus for the moment on the change in the mass-energy density 
represented by T u . It will increase for two reasons. First, the kinetic 
energy of each particle is now nonzero; its mass-energy increases 
from m to my. But in addition, the volume occupied by the cloud 
has been reduced by l/y due to length contraction. We’ve picked up 
two factors of gamma, so the result is p — > py 2 . This is different from 
the transformation behavior of a vector. When a vector is purely 
timelike in one frame, transformation to another frame raises its 
timelike component only by a factor of y, not y 2 . This tells us that 
a matrix like T transforms differently than a vector (section 7.2, 
p. 145). The general rule is that if we transform from coordinates x 
to x ' , then: 


rj - ii 


k\ 


dx' d dx' u 


( 4 ) 


dx K dx x 

An object that transforms in this standard way is called a rank-2 
tensor. The 2 is because it has two indices. Vectors and covectors 
have rank 1, invariants rank 0. 


In section 7.3, p. 146, we developed a method of transforming 
the metric from one set of coordinates to another; we now see that 
technique as an application of the more general rule given in equa- 
tion (4). Considered as a tensor, the metric is symmetric, g a b = 9ba- 
In most of the example’s we’ve been considering, the metric tensor 
is diagonal, but when it has off-diagonal elements, each of these is 
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one half the corresponding coefficient in the expression for ds, as in 
the following example. 

An non-diagonal metric tensor Example 4 

The answer to problem 2 on p. 1 60 was the metric 

ds 2 = dx 2 + dy 2 + 2 cos <p dx dy . 


Writing this in terms of the metric tensor, we have 
ds 2 = g^y dx^ L dx v 

= g xx dx 2 + g xy dx dy + g yx dy dx + g yy dy 2 
= g xx dx 2 + 2 g xy dx dy + g yy dy 2 . 


Therefore we have g xy = cos cp, not g xy = 2 cos cp. 

Dust in a different frame Example 5 

We start with the stress-energy tensor of the cloud of particles, in 
the rest frame of the particles. 


T^ y = 


(P 

0 

0 

VO 


0 

0 

0 

0 


0\ 

0 

0 

0 / 


Under a boost by v in the x direction, the tensor transformation 
law gives 


pi- 


/ TP 
y 2 vp 
0 

V o 


y 2 vp 
y 2 v 2 p 

0 

0 


o\ 

0 

0 

0 / 


The over-all factor of y 2 arises for the reasons previously de- 
scribed. 


Parity Example 6 

The parity transformation is a change of coordinates that looks 
like this: 


f = t 
x' = -x 

y' = -y 

z' = -z 

It turns right-handed screws into left-handed ones, but leaves the 
arrow of time unchanged. Under this transformation, the tensor 
transformation law tells us that some of the components of the 
stress-energy tensor will flip their signs, while others will stay the 
same: 


/no flip 

flip 

flip 

flip \ 

flip 

no flip 

no flip 

no flip 

flip 

no flip 

no flip 

no flip 

V flip 

no flip 

no flip 

no flip / 
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Everything here was based solely on the fact that T was a rank- 
2 tensor expressed in Minkowski coordinates, and therefore the 
same parity properties hold for other rank-2 tensors as well; cf. ex- 
ample 1 , p. 220. 

9.2.5 Pressure 

The stress-energy tensor carries information about pressure. For 
example, T xx is the flux in the x direction of i-momentuin. This is 
simply the pressure, P, that would be exerted on a surface with its 
normal in the x direction. Negative pressure is tension, and this is 
the origin of the term “tensor,” coined by Levi-Civita (see p. 154). 

Pressure as a source of gravitational fields Example 7 

Because the stress-energy tensor is the source of gravitational 
fields in general relativity, we can see that the gravitational field of 
an object should be influenced not just by its mass-energy but by 
its internal stresses. The very early universe was dominated by 
photons rather than by matter, and photons have a much higher 
ratio of momentum to mass-energy than matter, so the impor- 
tance of the pressure components in the stress-energy tensor 
was much greater in that era. In the universe today, the largest 
pressures are those found inside atomic nuclei. Inside a heavy 
nucleus, the electromagnetic pressure can be as high as 10 33 Pa! 
If general relativity’s description of pressure as a source of gravi- 
tational fields were wrong, then we would see anomalous effects 
in the gravitational forces exerted by heavy elements compared 
to light ones. Such effects have been searched for both in the 
laboratory 1 and in lunar laser ranging experiments, 2 with results 
that agreed with general relativity’s predictions. 

9.2.6 A perfect fluid 

The cloud in example 5 had a stress-energy tensor in its own rest 
frame that was isotropic, i.e., symmetric with respect to the x, y, 
and z directions. The tensor became anisotropic when we switched 
out of this frame. If a physical system has a frame in which its 
stress-energy tensor is isotropic, i.e., of the form 

(p 0 0 0\ 

TiW= 0 P 0 0 

0 0 P 0 ’ 

\0 0 0 pj 

we call it a perfect fluid in equilibrium. Although it may contain 
moving particles, this special frame is the one in which their mo- 
menta cancel out. In other cases, the pressure need not be isotropic, 

1 Kreuzer, Phys. Rev. 169 (1968) 1007. Described in section 3.7.3 of Will, 
“The Confrontation between General Relativity and Experiment,” relativity, 
livingreviews . org/Articles/lrr-2006-3/. 

2 Bartlett and van Buren, Phys. Rev. Lett. 57 (1986) 21, also described in 
Will. 
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and the stress exerted by the fluid need not be perpendicular to the 
surface on which it acts. The space-space components of T would 
then be the classical stress tensor, whose diagonal elements are the 
anisotropic pressure, and whose off-diagonal elements are the shear 
stress. This is the reason for calling T the stress-energy tensor. 

The perfect fluid form of the stress-energy tensor is extremely 
important and common. For example, cosmologists find that it is a 
nearly perfect description of the universe on large scales. 

We discussed in section ?? the ideas of converting back and forth 
between vectors and their corresponding covectors, and of notating 
this as the raising and lowering indices. We can do the same thing 
with the two indices of a rank-2 tensor, so that the stress-energy 
tensor can be expressed in four different ways: T ab , T a b, T a b , and 
T a b , but the symmetry of T means that there is no interesting dis- 
tinction between the final two of these. In special relativity, the 
distinctions among the various forms are not especially fascinating. 
We can always cover all of spacetime with Minkowski coordinates, 
so that the form of the metric is simply a diagonal matrix with el- 
ements ±1 on the diagonal. As with a rank-1 tensor, raising and 
lowering indices on a rank-2 tensor just flips some components and 
leaves others alone. The methods for raising and lowering don’t 
need to be deduced or memorized, since they follow uniquely from 
the grammar of index notation, e.g., T a b = gb c T ac . But there is 
the potential for a lot of confusion with all the signs, and in ad- 
dition there is the fact that some people use a H signature 

while others use — b ++• Since perfect fluids are so important, I’ll 
demonstrate how all of this works out in that case. 

For a perfect fluid, we can write the stress-energy tensor in the 
coordinate-independent form 

T ab = (p + P)o a o b - ( o c o c )Pg ab , 

where o represents the velocity vector of an observer in the fluid’s 

rest frame, and o c o c = o 2 = o • o equals 1 for our -| signature 

or —1 for the signature — b++. For ease of writing, let’s abbreviate 
the signature factor as s = o c o c . 

Suppose that the metric is diagonal, but its components are 
varying, g a g = diag(sA 2 , — sB 2 , . . .). The properly normalized ve- 
locity vector of an observer at (coordinate-)rest is o a = (A -1 , 0, 0, 0). 
Lowering the index gives o a = (sA, 0, 0, 0). The various forms of the 
stress-energy tensor then look like the following: 

Too = A 2 p T n = B 2 P 
T° 0 = sp T\ = -sP 
T 00 = A~ 2 p T 11 = B~ 2 P. 

Which of these forms is the “real” one, e.g., which form of the 00 
component is the one that the observer o actually measures when 
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she sticks a shovel in the ground, pulls out a certain volume of dirt, 
weighs it, and determines p! The answer is that the index notation 
is so slick and well designed that all of them are equally “real,” and 
we don’t need to memorize which actually corresponds to measure- 
ments. When she does this measurement with the shovel, she could 
say that she is measuring the quantity T ab o a Ob- But because all of 
the a’s and b's are paired off, this expression is a rank-0 tensor. That 
means that T ab o a Ob , T a bO a o b , and T a b o a o b are all the same number. 
If, for example, we have coordinates in which the metric is diagonal 
and has elements ±1, then in all these expressions the differing signs 
of the o’s are exactly compensated for by the signs of the T’s. 


9.2.7 Two simple examples 


A rope under tension Example 8 

As a real-world example in which the pressure is not isotropic, 
consider a rope that is moving inertially but under tension, i.e., 
equal forces at its ends cancel out so that the rope doesn’t ac- 
celerate. Tension is the same as negative pressure. If the rope 
lies along the x axis and its fibers are only capable of supporting 
tension along that axis, then the rope’s stress-energy tensor will 
be of the form 


J\1V 


/p 0 0 0\ 

0 P 0 0 
0 0 0 0 
\0 0 0 0 / 


where P is negative and equals minus the tension per unit cross- 
sectional area. 


Conservation of energy-momentum is expressed as (eq. 3, p. 1 79) 


dT ab 

dx a 


= 0 . 


Converting the abstract indices to concrete ones, we have 


dT^ 

()X U 


= 0 , 


where there is an implied sum over p, and the equation must hold 
both in the case where v is a label for t and the one where it refers 
to x. 

In the first case, we have 


dT tl dT xt 
~df + ~d)T~ ’ 

which is a statement of conservation of energy, energy being the 
timelike component of the energy-momentum. The first term is 
zero because p is constant by virtue of our assumption that the 
rope was uniform. The second term is zero because T xt = 0. 
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Therefore conservation of energy is satisfied. This came about 
automatically because by writing down a time-independent ex- 
pression for the stress-energy, we were dictating a static equilib- 
rium. 


When v stands for x, we get an equation that requires the x com- 
ponent of momentum to be conserved, 


r)T tx 0T XX 
dt + dx 


= 0 . 


This simply says 

dx 

meaning that the tension in the rope is constant along its length. 

A rope supporting its own weight Example 9 

A variation on example 8 is one in which the rope is hanging 
and supports its own weight. Although gravity is involved, we 
can solve this problem without general relativity, by exploiting the 
equivalence principle (section 5.2, p. 120). As discussed in sec- 
tion 5.1 on p. 117, an inertial frame in relativity is one that is free- 
falling. We define an inertial frame of reference o, corresponding 
to an observer free-falling past the rope, and a noninertial frame 
o' at rest relative to the rope. 

Since the rope is hanging in static equilibrium, observer o' sees 
a stress-energy tensor that has no time-dependence. The off- 
diagonal components vanish in this frame, since there is no mo- 
mentum. The stress-energy tensor is 


y-H'V 


p 0 
0 P 


where the components involving y and z are zero and not shown, 
and P is negative as in example 8. We could try to apply the 
conservation of energy condition to this stress-energy tensor as 
in example 8, but that would be a mistake. As discussed in 7.5 
on p. 150, rates of change can only be measured by taking par- 
tial derivatives with respect to the coordinates if the coordinates 
are Minkowski, i.e., in an inertial frame. Therefore we need to 
transform this stress-energy tensor into the inertial frame o. 

For simplicity, we restrict ourselves to the Newtonian approxima- 
tion, so that the change of coordinates between the two frames 
is 


t^f 

x^x'+laf' 2 , 

where a > 0 if the free-falling observer falls in the negative x 
direction, i.e., positive x is up. That is, if a point on the rope at a 
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fixed x' is marked with a spot of paint, then free-falling observer o 
sees the spot moving up, to larger values of x, at t > 0. Applying 
the tensor transformation law, we find 

7 r|rv _ ( P \ 

\pat P + p a 2 t 2 ) ’ 

As in example 8, conservation of energy is trivially satisfied. Con- 
servation of momentum gives 

g T tx QJXX 

~df + ~dx~ - ’ 

or 

OP n 

pa+ to'°- 

Integrating this with respect to x, we have 
P = -pax + constant. 

Let the cross-sectional area of the rope be A, and let p = pA be 
the mass per unit length and T = - PA the tension. We then find 

T = pax + constant. 

Conservation of momentum requires that the tension vary along 
the length of the rope, just as we expect from Newton’s laws: a 
section of the rope higher up has more weight below it to sup- 
port. 

9.2.8 Energy conditions 

The result of example 9 could cause something scary to happen. 
If we walk up to a clothesline under tension and give it a quick 
karate chop, we will observe wave pulses propagating away from the 
chop in both directions, at velocities v = ±y/T / p. But the result 
of the example is that this expression increases without limit as x 
gets larger and larger. At some point, v will exceed the speed of 
light. (Of course any real rope would break long before this much 
tension was achieved.) Two things led to the problematic result: (1) 
we assumed there was no constraint on the possible stress-energy 
tensor in the rest frame of the rope; and (2) we used a Newtonian 
approximation to change from this frame to the free-falling frame. 
In reality, we don’t know of any material so stiff that vibrations 
propagate in it faster than c. In fact, all ordinary materials are made 
of atoms, atoms are bound to each other by electromagnetic forces, 
and therefore no material made of atoms can transmit vibrations 
faster than the speed of an electromagnetic wave, c. 

Based on these conditions, we therefore expect there to be cer- 
tain constraints on the stress-energy tensor of any ordinary form 
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of matter. For example, we don’t expect to find any rope whose 
stress-energy tensor looks like this: 


/I 0 0 0\ 

= 0-200 
0 0 0 0’ 

\0 0 0 0/ 

because here the tensile stress +2 is greater than the mass density 
1, which would lead to |u| = a^/ 2/1 > 1. Constraints of this kind are 
called energy conditions. Hypothetical forms of matter that violate 
them are referred to as exotic matter; if they exist, they are not made 
of atoms. This particular example violates the an energy condition 
known as the dominant energy condition, which requires p > 0 and 
|.P| > p. There are about five energy conditions that are commonly 
used, and a detailed discussion of them is more appropriate for a 
general relativity text. The common ideas that recur in many of 
them are: (1) that energy density is never negative in any frame of 
reference, and (2) that there is never a flux of energy propagating 
at a speed greater than c. 

An energy condition that is particularly simple to express is the 
trace energy condition (TEC), 

n > o, 

where we have to have one upper index and one lower index in order 
to obey the grammatical rules of index notation. In Minkowski 
coordinates (t,x,y,z), this becomes > 0, with the implied sum 
over n expanding to give 

T\ + T x x + T\ + T\ > 0 . 

The left-hand side of this relation, the sum of the main-diagonal 
elements of a matrix, is called the trace of the matrix, hence the 
name of this energy condition. Since this book uses the signature 
-| for the metric, raising the second index changes this to 

rjitt rj-iXX rpZZ > Q 


In example 5 on p. 181 , we computed the stress-energy tensor of a 
cloud of dust, in a frame moving at velocity v relative to the cloud’s 
rest frame. The result was 

/ 7 2 p 7 2 vp 0 0\ 

= 7 2 vp 7 2 v 2 p 0 0 

0 0 0 0 ' 

0 0 0 0 / 

In this example, the trace energy condition is satisfied precisely un- 
der the condition |u| < 1, which can be interpreted as a statement 
that according the TEC, the mass-energy of the cloud can never be 
transported at a speed greater than c in any frame. 
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9.3 Gauss’s theorem 

9.3.1 Integral conservation laws 

We’ve expressed conservation of charge and energy-momentum 
in terms of zero divergences, 


dja - 0 
dx a 

8T ab _ 
dx a 

These are expressed in terms of derivatives. The derivative of a 
function at a certain point only depends on the behavior of the 
function near that point, so these are local statements of conser- 
vation. Conservation laws can also be stated globally: the total 
amount of something remains constant. Taking charge as an exam- 
ple, observer o defines Minkowski coordinates ( t,x,y,z ), and at a 
time t\ says that the total amount of charge in some region is 



d/ Three lines go in, and 
three come out. These could be 
field lines or world lines. 


q(t l )= [ J a dS a , 

Jt\ 

where the subscript t\ means that the integrand is to be evaluated 
over the surface of simultaneity t = ti, and d S a = (dxdydz, 0, 0, 0) 
is an element of 3-volume expressed as a covector (p. 156). The 
charge at some later time t 2 would be given by a similar integral. 
If charge is conserved, and if our region is surrounded by an empty 
region through which no charge is coming in or out, then we should 
have q(t 2 ) = q(h). 

9.3.2 A simple form of Gauss’s theorem 

The connection between the local and global conservation laws 
is provided by a theorem called Gauss’s theorem. In your course 
on electromagnetism, you learned Gauss’s law, which relates the 
electric flux through a closed surface to the charge contained inside 
the surface. In the case where no charges are present, it says that 
the flux through such a surface cancels out. The interpretation is 
that since field lines only begin or end on charges, the absence of 
any charges means that the lines can’t begin or end, and therefore, 
as in figure d, any field line that enters the surface (contributing 
some negative flux) must eventually come back out (creating some 
positive flux that cancels out the negative). But there is nothing 
about figure d that requires it to be interpreted as a drawing of 
electric field lines. It could just as easily be a drawing of the world- 
lines of some charged particles in 1 + 1 dimensions. The bottom of 
the rectangle would then be the surface at t\ and the top t 2 . We 
have q(t\) = 3 and q(t 2 ) = 3 as well. 

For simplicity, let’s start with a very restricted version of Gauss’s 
theorem. Let a vector field J a be defined in two dimensions. (We 
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don’t care whether the two dimensions are both spacelike or one 
spacelike and one timelike; that is, Gauss’s theorem doesn’t depend 
on the signature of the metric.) Let R be a rectangular area, and 
let S be its boundary. Define the flux of the field through S as 

<h= / J a dS a , 

J s 

where the integral is to be taken over all four sides, and the covector 
d£ a points outward. If the field has zero divergence, dJ a /dx a = 0, 
then the flux is zero. 

Proof: Define coordinates x and y aligned with the rectangle. 
Along the top of the rectangle, the element of the surface, oriented 
outwards, is dS = (0, dx), so the contribution to the flux from the 
top is 

^top — j (]J top) d.r . 

J top 

At the bottom, an outward orientation gives dS = (0, — dx), so 


^bottom — 


J y (yb. 


ottom. 


dx. 


J bottom 

Using the fundamental theorem of calculus, the sum of these is 

djy 


^*top T ^bottom — 


'R 


dy 


dy dx. 


Adding in the similar expressions for the left and right, we get 


$ = 


/R 


dJ x djy 
dx dy 


dx dy. 


But the integrand is the divergence, which is zero by assumption, 
so = 0 as claimed. 


9.3.3 The general form of Gauss’s theorem 

Although the coordinates were labeled x and y, the proof made 
no use of the metric, so the result is equally valid regardless of the 
signature. The rectangle could equally well have been a rectangle in 
1 + 1-dimensional spacetime. The generalization to n dimensions is 
also automatic, and everything also carries through without modi- 
fication if we replace the vector J a with a tensor such as T ab that 
has more indices — the extra index b just comes along for the ride. 
Sometimes, as with Gauss’s law in electromagnetism, we are inter- 
ested in fields whose divergences are not zero. Gauss’s theorem then 
becomes 

Is r,is ‘ = I3 dv - 

where dx is the element of n-volume. In 3 + 1 dimensions we could 
use Minkowski coordinates to write the element of 4- volume as dx = 
dtdxdydz, and even though this expression in written in terms of 
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e / Proof of Gauss’s theorem 
for a region with an arbitrary 
shape. 


these specific coordinates, it is actually Lorentz invariant (section 
2.5, p. 49). 

The generalization to a region R with an arbitrary shape, figure 
e, is less trivial. The basic idea is to break up the region into rect- 
anglular boxes, e/1. Where the faces of two boxes coincide on the 
interior of R, their own outward directions are opposite. Therefore 
if we add up the fluxes through the surfaces of all the boxes, the 
contributions on the interior cancel, and we’re left with only the 
exterior contributions. If R could be dissected exactly into boxes, 
then this would complete the proof, since the sum of exterior contri- 
butions would be the same as the flux through S, and the left-hand 
side of Gauss’s theorem would be additive over the boxes, as is the 
right-hand side. 

The difficulty arises because a smooth shape typically cannot be 
built out of bricks, a fact that is well known to Lego enthusiasts 
who build elaborate models of the Death Star. We could argue on 
physical grounds that no real-world measurement of the flux can 
depend on the granular structure of S at arbitrarily small scales, 
but this feels a little unsatisfying. For comparison, it is not strictly 
true that surface areas can be treated in this way. For example, if 
we approximate a unit 3-sphere using smaller and smaller boxes, the 
limit of the surface area is which is quite a bit greater than the 
surface area 47r/3 of the limiting surface. 

Instead, we explicitly consider the nonrectangular pieces at the 
surface, such as the one in e/2. In this drawing in n = 2 dimensions, 
the top of this piece is approximately a line, and in the limit we’ll be 
considering, where its width becomes an infinitesimally small dx, the 
error incurred by approximating it as a line will be negligible. We 
define vectors dx and dx* as shown in the figure. In more than the 
two dimensions shown in the figure, we would approximate the top 
surface as an (n — l)-dimensional parallelepiped spanned by vectors 
dx*, dy*, . . . This is the point at which the use of the covector S a 
(p. 156) pays off by greatly simplifying the proof. 3 Applying this 
to the top of the triangle, dS is defined as the linear function that 
takes a vector J and gives the n-volume spanned by J along with 
dx*, ... 

Call the vertical coordinate on the diagram t, and consider the 
contribution to the flux from J’s time component, J*. Because the 


,s Hcre is an example of the ugly complications that occur if one doesn’t have 
access to this piece of technology. In the low-tech approach, in Euclidean space, 
one defines an element of surface area dA = hdh, where the unit vector b is 

outward-directed with n • n = 1. But in a signature such as -| , we could 

have a region R such that over some large area of the bounding surface S, the 
normal direction was lightlike. It would therefore be impossible to scale n so that 
n ■ n was anything but zero. As an example of how much work it is to resolve 
such issues using stone-age tools, see Synge, Relativity: The Special Theory, 
VIII, §6-7, where the complete argument takes up 22 pages. 
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triangle’s size is an infinitesimal of order dx, we can approximate J l 
as being a constant throughout the triangle, while incurring only an 
error of order dx\ (By stating Gauss’s theorem in terms of deriva- 
tives of J, we implicitly assumed it to be differentiable, so it is not 
possible for it to jump discontinuously.) Since dS depends linearly 
not just on J but on all the vectors, the difference between the flux 
at the top and bottom of the triangle equals is proportional to the 
area spanned by J and dx* — dx. But the latter vector is is in the 
t direction, and therefore the area it spans when taken with J t is 
approximately zero. Therefore the contribution of J * to the flux 
through the triangle is zero. To estimate the possible error due to 
the approximations, we have to count powers of dx. The possible 
variation of J 1 over the triangle is of order (dx) 1 . The covector dS is 
of order (dx) n_1 , so the possible error in the flux is of order (dx) n . 

This was only an estimate of one part of the flux, the part con- 
tributed by the component J t . However, we get the same estimate 
for the other parts. For example, if we refer to the two dimensions 
in figure e/2 as t and x, then interchanging the roles of t and x 
in the above argument produces the same error estimate for the 
contribution from J x . 

This is good. When we began this argument, we were motivated 
to be cautious by our observation that a quantity such as the surface 
area of R can’t be calculated as the limit of the surface area as 
approximated using boxes. The reason we have that problem for 
surface area is that the error in the approximation on a small patch 
is of order (dx) n_1 , which is an infinitesimal of the same order as the 
surface area of the patch itself. Therefore when we scale down the 
boxes, the error doesn’t get small compared to the total area. But 
when we consider flux, the error contibuted by each of the irregularly 
shaped pieces near the surface goes like (dx) n , which is of the order 
of the n- volume of the piece. This volume goes to zero in the limit 
where the boxes get small, and therefore the error goes to zero as 
well. This establishes the generalization of Gauss’s theorem to a 
region R of arbitrary shape. 

9.3.4 The energy-momentum vector 

Einstein’s celebrated E = me 2 is a special case of the statement 
that energy-momentum is conserved, transforms like a four-vector, 
and has a norm m equal to the rest mass. Section 4.4 on p. 98 
explored some of the problems with Einstein’s original attempt at a 
proof of this statement, but only now are we prepared to completely 
resolve them. One of the problems was the definitional one of what 
we mean by the energy-momentum of a system that is not composed 
of pointlike particles. The answer is that for any phenomenon that 
carries energy-momentum, we must decide how it contributes to the 
stress-energy tensor. For example, the stress-energy tensor of the 
electric and magnetic fields is described in section 10.6 on p. 226. 
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1 


f / Conservation of the inte- 
grated energy-momentum vector. 


For the reasons discussed in section 4.4 on p. 98, it is necessary to 
assume that energy-momentum is locally conserved, and also that 
the system being described is isolated. Local conservation is de- 
scribed by the zero-divergence property of the stress-energy tensor, 
dT ab /dx a = 0. Once we assume local conservation, figure f shows 
how to prove conservation of the integrated energy-momentum vec- 
tor using Gauss’s theorem. Fix a frame of reference o. Surrounding 
the system, shown as a dark stream flowing through spacetime, we 
draw a box. The box is bounded on its past side by a surface that 
o considers to be a surface of simultaneity sa, and likewise on the 
future side sb • It doesn’t actually matter if the sides of the box are 
straight or curved according to o. What does matter is that because 
the system is isolated, we have enough room so that between the 
system and the sides of the box there can be a region of vacuum, in 
which the stress-energy tensor vanishes. 

Observer o says that at the initial time corresponding to sa, the 
total amount of energy-momentum in the system was 


Pa 


11 = - / dSb 


/S A 


where the minus sign occurs because we take dSV, to point outward, 
for compatibility with Gauss’s theorem, and this makes it antiparal- 
lel to the velocity vector o, which is the opposite of the orientation 
defined in equations ( 2 ) on p. 179. At the final time we have 



S B 

1 

1 

1 

1 

x< 

1 ^ " 
l *’* " 


g / Lorentz 

transformation of 


the integrated energy-momentum 
vector. 


LI 

Pb = 


dS',, 


'SB 


with a plus sign because the outward direction is now the same as 
the direction of o. Because of the vacuum region, there is no flux 
through the sides of the box, and therefore by Gauss’s theorem — 
p^ = 0. The energy-momentum vector has been globally conserved 
according to o. 

We also need to show that the integrated energy-momentum 
transforms properly as a four- vector. To prove this, we apply Gauss’s 
theorem to the region shown in figure g, where sc is a surface of si- 
multaneity according to some other observer o'. Gauss’s theorem 
tells us that pe = pc ; which means that the energy- momentum on 
the two surfaces is the same vector in the absolute sense — but 
this doesn’t mean that the two vectors have the same components 
as measured by different observers. Observer o says that sb is a 
surface of simultaneity, and therefore considers pb to be the total 
energy-momentum at a certain time. She says the total mass-energy 
is PB°a ( e< T (2a), P- 179) , and similarly for the total momentum in 
the three spatial directions si, S 2 , and S 3 (eq. (2b)). Observer o', 
meanwhile, considers sc to be a surface of simultaneity, and has 
the same interpretations for quantities such as p^o'^. But this is 
just a way of saying that p^ and p(l are related to each other by 
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a change of basis from (o, si,S2,S3) to (o', s ' l5 s' 2 , S3). A change of 
basis like this is just what we mean by a Lorentz transformation, so 
the integrated energy- momentum p transforms as a four- vector. 


9.3.5 Angular momentum 

In sec. 8.2.2, p. 167, we gave physical and mathematical plau- 
sibility arguments for defining relativistic angular momentum as 
L ab = r a p b — r b p a . We can now show that this quantity is ac- 
tually conserved. Just as the flux of energy-momentum p a is the 
stress-energy tensor T ab , we can take the angular momentum L ab 
and define its flux X abc = r a T bc — r b T ac . An observer with velocity 
vector o c says that the density of energy-momentum is T ac o c and 
the density of angular momentum is A abc o c . If we can show that the 
divergence of A with respect to its third index is zero, then it follows 
that angular momentum is conserved. The divergence is 


d\ abc 

dx c 


_d_ 

dx c 


r arpbc _ J.brpo 


The product rule gives 


d\ abc 

dx c 


8 a c T 


be _|_ r a 


_d_ 

dx c 


nbc 


5 b T ac 



where d), called the Kronecker delta, is defined as 1 if i = j and 0 
if i 7 ^ j. The divergence of the stress-energy tensor is zero, so the 
second and fourth terms vanish, and 


d\ 


abc 


dx c 


= 5 a c T bc - 5 b T a 


J-iba rj-)x 


<ab 


but this is zero because the stress-energy tensor is symmetric. 


9.4 a The covariant derivative 

In this optional section we deal with the issues raised in section 

7.5 on p. 150. We noted there that in non-Minkowski coordinates, 
one cannot naively use changes in the components of a vector as a 
measure of a change in the vector itself. A constant scalar function 
remains constant when expressed in a new coordinate system, but 
the same is not true for a constant vector function, or for any tensor 
of higher rank. This is because the change of coordinates changes 
the units in which the vector is measured, and if the change of 
coordinates is nonlinear, the units vary from point to point. This 
topic doesn’t logically belong in this chapter, but I’ve placed it here 
because it can’t be discussed clearly without already having covered 
tensors of rank higher than one. 

Consider the one-dimensional case, in which a vector v a has only 
one component, and the metric is also a single number, so that we 


Section 9.4 * The covariant derivative 


193 



II II I I I II I I I I II I I I II 


h/ These three rulers represent 
three choices of coordinates. 


can omit the indices and simply write v and g. (We just have to 
remember that v is really a vector, even though we’re leaving out 
the upper index.) If v is constant, its derivative dv/ dx, computed in 
the ordinary way without any correction term, is zero. If we further 
assume that the metric is simply the constant g = 1, then zero is 
not just the answer but the right answer. 

Now suppose we transform into a new coordinate system X, and 
the metric G, expressed in this coordinate system, is not constant. 
Applying the tensor transformation law, we have V = vdX/dx , 
and differentiation with respect to X will not give zero, because the 
factor dX/ dx isn’t constant. This is the wrong answer: V isn’t 
really varying, it just appears to vary because G does. 


We want to add a correction term onto the derivative operator 
d/ dX, forming a new derivative operator Vx that gives the right 
answer. Vx is called the covariant derivative. This correction term 
is easy to find if we consider what the result ought to be when dif- 
ferentiating the metric itself. In general, if a tensor appears to vary, 
it could vary either because it really does vary or because the met- 
ric varies. If the metric itself varies, it could be either because the 
metric really does vary or . . . because the metric varies. In other 
words, there is no sensible way to assign a nonzero covariant deriva- 
tive to the metric itself, so we must have X xG = 0. The required 
correction therefore consists of replacing d/ dX with 


v =A_ r -^ 

A dx dX ' 

Applying this to G gives zero. G is a second-rank tensor with two 
lower indices. If we apply the same correction to the derivatives of 
other tensors of this type, we will get nonzero results, and they will 
be the right nonzero results. 


Mathematically, the form of the derivative is (1/y) dy/ dx , which 
is known as a logarithmic derivative, since it equals d(ln y)/ dx. It 
measures the multiplicative rate of change of y. For example, if 
y scales up by a factor of k when x increases by 1 unit, then the 
logarithmic derivative of y is In A:. The logarithmic derivative of 
e cx is c. The logarithmic nature of the correction term to Vx is a 
good thing, because it lets us take changes of scale, which are mul- 
tiplicative changes, and convert them to additive corrections to the 
derivative operator. The additivity of the corrections is necessary if 
the result of a covariant derivative is to be a tensor, since tensors 
are additive creatures. 


What about quantities that are not second-rank covariant ten- 
sors? Under a rescaling of coordinates by a factor of k, covectors 
scale by k ~ l , and second-rank tensors with two lower indices scale 
by AU 2 . The correction term should therefore be half as much for 
covectors, 


Vx 


d _ 1 jdG 
dX _ 2 dX' 
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and should have an opposite sign for vectors. 

Generalizing the correction term to derivatives of vectors in more 
than one dimension, we should have something of this form: 

V a v b = d a v b + T b ac v c 
^ a^b = 9 a V b r boVci 

where T fe ac , called the Christoffel symbol, does not transform like 
a tensor, and involves derivatives of the metric. (“Christoffel” is 
pronounced “Krist- AWful,” with the accent on the middle syllable.) 

An important gotcha is that when we evaluate a particular com- 
ponent of a covariant derivative such as V2U 3 , it is possible for the 
result to be nonzero even if the component v 3 vanishes identically. 

Christoffel symbols on the globe Example 1 0 

As a qualitative example, consider the airplane trajectory shown 
in figure i, from London to Mexico City. This trajectory is the short- 
est one between these two points; such a minimum-length trajec- 
tory is called a geodesic. In physics it is customary to work with 
the colatitude, 0, measured down from the north pole, rather then 
the latitude, measured from the equator. At P, over the North At- 
lantic, the plane’s colatitude has a minimum. (We can see, with- 
out having to take it on faith from the figure, that such a minimum 
must occur. The easiest way to convince oneself of this is to con- 
sider a path that goes directly over the pole, at 6 = 0.) 

At P, the plane’s velocity vector points directly west. At Q, over 
New England, its velocity has a large component to the south. 
Since the path is a geodesic and the plane has constant speed, 
the velocity vector is simply being parallel-transported; the vec- 
tor’s covariant derivative is zero. Since we have v e = 0 at P, the 
only way to explain the nonzero and positive value of d$v Q is that 
we have a nonzero and negative value of F 6 ^ . 

By symmetry, we can infer that F 9 ^ must have a positive value 
in the southern hemisphere, and must vanish at the equator. 

r 9 ^ is computed in example 1 1 on page 197. 

Symmetry also requires that this Christoffel symbol be indepen- 
dent of 4>, and it must also be independent of the radius of the 
sphere. 

To compute the covariant derivative of a higher-rank tensor, we 
just add more correction terms, e.g., 

V a u bc = d a u bc - r d ba u dc - r d ca u bd 



i / Example 10. 


or 


v a t/ fe c = d a u£ - T\ a u c d + r c ad u b d . 
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V a v b = H 

0L- 

V a v b = 1 

0- 


j / Birdtracks notation for the 
covariant derivative. 


With the partial derivative 0, it does not make sense to use the 
metric to raise the index and form d^. It does make sense to do so 
with covariant derivatives, so V a = g ab X b is a correct identity. 

9.4.1 Comma, semicolon, and birdtracks notation 

Some authors use superscripts with commas and semicolons to 
indicate partial and covariant derivatives. The following equations 
give equivalent notations for the same derivatives: 

q - A 

/J dxi 1 

d^x u = x^ 

V a Xb = Xb-a 

v a x b = x b ' a 


Figure j shows two examples of the corresponding birdtracks no- 
tation. Because birdtracks are meant to be manifestly coordinate- 
independent, they do not have a way of expressing non-covariant 
derivatives. 


9.4.2 Finding the Christoffel symbol from the metric 

We’ve already found the Christoffel symbol in terms of the metric 
in one dimension. Expressing it in tensor notation, we have 

T d ba = lj cd (d 7 gr?), 

where inversion of the one-component matrix G has been replaced 
by matrix inversion, and, more importantly, the question marks indi- 
cate that there would be more than one way to place the subscripts 
so that the result would be a grammatical tensor equation. The 
most general form for the Christoffel symbol would be 

r b ac = ^ g db ( Ld c g ab + Md a g cb + Nd b g ca ) , 

where L, M, and N are constants. Consistency with the one- 
dimensional expression requires L + M + N = 1. The condition 
L = M arises on physical, not mathematical grounds; it reflects the 
fact that experiments have not shown evidence for an effect called 
torsion, in which vectors would rotate in a certain way when trans- 
ported. The L and M terms have a different physical significance 
than the N term. 

Suppose an observer uses coordinates such that all objects are 
described as lengthening over time, and the change of scale accu- 
mulated over one day is a factor of k > 1. This is described by the 
derivative dtg xx < 1; which affects the M term. Since the metric is 
used to calculate squared distances, the g xx matrix element scales 
down by 1 /V~k. To compensate for dtv x < 0, so we need to add a 
positive correction term, M > 0, to the covariant derivative. When 
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the same observer measures the rate of change of a vector v t with 
respect to space, the rate of change comes out to be too small , be- 
cause the variable she differentiates with respect to is too big. This 
requires N < 0, and the correction is of the same size as the M 
correction, so \M\ = |IV|. We find L = M = —N = 1. 

Self-check: Does the above argument depend on the use of space 
for one coordinate and time for the other? 

The resulting general expression for the Christoffel symbol in 
terms of the metric is 

r a h = —g ( daSbd T dbSad ~ dddab ) • 

One can go back and check that this gives V c g a b = 0. 

Self-check: In the case of 1 dimension, show that this reduces to 
the earlier result of —(1/2) d G/ dX. 

T is not a tensor, i.e., it doesn’t transform according to the tensor 
transformation rules. Since T isn’t a tensor, it isn’t obvious that the 
covariant derivative, which is constructed from it, is tensorial. But 
if it isn’t obvious, neither is it surprising - the goal of the above 
derivation was to get results that would be coordinate- independent. 

Christoffel symbols on the globe, quantitatively Example 1 1 
In example 10 on page 195, we inferred the following properties 
for the Christoffel symbol F 0 ^ on a sphere of radius R\ F 0 ^ is 
independent of cf> and R, F 0 ^ < 0 in the northern hemisphere 
(colatitude 0 less than n/2), F 0 ^ = 0 on the equator, and T 0 ^ > 
0 in the southern hemisphere. 

The metric on a sphere is ds 2 = R 2 d0 2 + R 2 sin 2 0 d4? 2 . The only 
nonvanishing term in the expression for F 0 ^ is the one involving 
d 0 g <i)(i) = 2 R 2 sin 0 cos 0. The result is F 0 ^ = - sin 0 cos 0, which 
can be verified to have the properties claimed above. 

9.4.3 The geodesic equation 

A world-line is a timelike curve in spacetime. As a special case, 
some such curves are actually not curved but straight. Physically, 
the ones we consider straight are those that could be the world- 
line of a test particle not acted on by any non-gravitational forces 
(sec. 5.1, p. 117). Mathematically, we will show in this section how 
the Christoffel symbols can be used to find differential equations 
that describe such motion. The world-line of a test particle is called 
a geodesic. The equations also have solutions that are spacelike or 
lightlike, and we consider these to be geodesics as well. 

Geodesics play the same role in relativity that straight lines 
play in Euclidean geometry. In Euclidean geometry, we can specify 
two points and ask for the curve connecting them that has mini- 
mal length. The answer is a line. In special relativity, a timelike 
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k/The geodesic, 1, preserves 
tangency under parallel trans- 
port. The non-geodesic curve, 
2, doesn’t have this property; 
a vector initially tangent to the 
curve is no longer tangent to it 
when parallel-transported along 
it. 


geodesic maximizes the proper time (cf. section 2.4.2, p. 48) between 
two events. 

In special relativity, geodesics are given by linear equations when 
expressed in Minkowski coordinates, and the velocity vector of a test 
particle has constant components when expressed in Minkowski co- 
ordinates. In general relativity, Minkowski coordinates don’t exist, 
and geodesics don’t have the properties we expect based on Eu- 
clidean intuition; for example, initially parallel geodesics may later 
converge or diverge. 

Characterization of the geodesic 

A geodesic can be defined as a world-line that preserves tangency 
under parallel transport, k. This is essentially a mathematical way 
of expressing the notion that we have previously expressed more 
informally in terms of “staying on course” or moving “inertially.” 
(For reasons discussed in more detail on p. 200, this definition is 
preferable to defining a geodesic as a curve of extremal or stationary 
metric length.) 

A curve can be specified by giving functions x‘ l (X) for its coor- 
dinates, where A is a real parameter. A vector lying tangent to the 
curve can then be calculated using partial derivatives, T l = dx l /d\. 
There are three ways in which a vector function of A could change: 

(1) it could change for the trivial reason that the metric is changing, 
so that its components changed when expressed in the new metric; 

(2) it could change its components perpendicular to the curve; or 

(3) it could change its component parallel to the curve. Possibility 
1 should not really be considered a change at all, and the definition 
of the covariant derivative is specifically designed to be insensitive 
to this kind of thing. 2 cannot apply to T'\ which is tangent by 
construction. It would therefore be convenient if T l happened to 
be always the same length. If so, then 3 would not happen either, 
and we could reexpress the definition of a geodesic by saying that 
the covariant derivative of T l was zero. For this reason, we will 
assume for the remainder of this section that the parametrization 
of the curve has this property. In a Newtonian context, we could 
imagine the x l to be purely spatial coordinates, and A to be a uni- 
versal time coordinate. We would then interpret T l as the velocity, 
and the restriction would be to a parametrization describing motion 
with constant speed. In relativity, the restriction is that A must be 
an affine parameter. For example, it could be the proper time of a 
particle, if the curve in question is timelike. 

Covariant derivative with respect to a parameter 

The notation of section 9.4 is not quite adapted to our present 
purposes, since it allows us to express a covariant derivative with 
respect to one of the coordinates, but not with respect to a param- 
eter such as A. We would like to notate the covariant derivative of 


198 


Chapter 9 


Flux 




T l with respect to A as V A T®, even though A isn’t a coordinate. To 
connect the two types of derivatives, we can use a total derivative. 
To make the idea clear, here is how we calculate a total derivative 
for a scalar function f(x,y), without tensor notation: 

d / df dx df dy 
dA dx d\ + dy d\' 

This is just the generalization of the chain rule to a function of two 
variables. For example, if A represents time and / temperature, 
then this would tell us the rate of change of the temperature as 
a thermometer was carried through space. Applying this to the 
present problem, we express the total covariant derivative as 

dr b 

v A r = (wn— 

= (dtT 1 + r‘ k T c ) A). 


The geodesic equation 

Recognizing d^T 1 dx b / dA as a total non-covariant derivative, we 


find 


dT* 


Ax b 


Va r = — +T\ c T c — 


dA ° c dA 

Substituting dx l /dX for T\ and setting the covariant derivative 
equal to zero, we obtain 

dx c dx b 


<# x l 


+ r* 6c — — = o. 


dA 2 ' * oc dA dA 
This is known as the geodesic equation 


If this differential equation is satisfied for one affine parameter 
A, then it is also satisfied for any other affine parameter X' = aX + b, 
where a and b are constants (problem 5, p. 214). Recall that affine 
parameters are only defined along geodesics, not along arbitrary 
curves. We can’t start by defining an affine parameter and then use 
it to find geodesics using this equation, because we can’t define an 
affine parameter without first specifying a geodesic. Likewise, we 
can’t do the geodesic first and then the affine parameter, because if 
we already had a geodesic in hand, we wouldn’t need the differential 
equation in order to find a geodesic. The solution to this chicken- 
and-egg conundrum is to write down the differential equations and 
try to find a solution, without trying to specify either the affine 
parameter or the geodesic in advance. 

The geodesic equation is useful in establishing one of the nec- 
essary theoretical foundations of relativity, which is the uniqueness 
of geodesics for a given set of initial conditions. If the geodesic 
were not uniquely determined, then particles would have no way of 
deciding how to move. The form of the geodesic equation guaran- 
tees uniqueness, because one can use it to define an algorithm that 
constructs a geodesic for a given set of initial conditions. 
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Not characterizable as curves of stationary length 

The geodesic equation may seem cumbersome. Why not just 
define a geodesic as a curve connecting two points that maximizes 
or minimizes its own metric length? The trouble is that this doesn’t 
generalize nicely to curves that are not timelike. The casual reader 
may wish to skip the remainder of this subsection, which discusses 
this point. 

For the spacelike case, we would want to define the proper met- 
ric length <7 of a curve as a = f \J — gtjdx 1 dxi , the minus sign being 

necessary because we are using a metric with signature -| , and 

we want the result to be real. The quantity o can be thought of as 
the result we would get by approximating the curve with a chain of 
short line segments, and adding their proper lengths. In the case 
where the whole curve lies within a plane of simultaneity for some 
observer, a is the curve’s Euclidean length as measured by that ob- 
server. Our o is neither a maximum nor a minimum for a spacelike 
geodesic connecting two events. To see this, pick a frame in which 
the two events are simultaneous, and adopt Minkowski coordinates 
such that the points both lie on the x axis. Deforming the geodesic 
in the xy plane does what we expect according to Euclidean geom- 
etry: it increases the length. Deforming it in the xt plane, however, 
reduces the length (as becomes obvious when you consider the case 
of a large deformation that turns the geodesic into a curve of length 
zero, consisting of two lightlike line segments). The result is that 
the geodesic is neither a minimizer nor a maximizer of a. 

Maximizing or minimizing the proper length is a strong require- 
ment. A related but more permissive criterion to apply to a curve 
connecting two fixed points is that if we vary the curve by some 
small amount, the variation in length should vanish to first order. 
For example, two points A and B on the surface of the earth deter- 
mine a great circle, i.e., a circle whose circumference equals that of 
the earth. This great circle gives us two different paths by which we 
could travel from A to B. One of these will usually be longer than 
the other. Both of these are as straight as they can be while keeping 
to the surface of the earth, so in this context of spherical geometry 
they are both considered to be geodesics. One thing that the two 
paths have in common is that they are both stationary. Stationarity 
is defined as follows. Given a certain parametrized curve 7 (t), let 
us fix some vector h (t) at each point on the curve that is tangent 
to the earth’s surface, and let h be a continuous function of t that 
vanishes at the end-points. Then if e is small compared to the radius 
of the earth, we can clearly define what it means to perturb 7 by 
eh, producing another curve 7* similar to, but not the same as, 7. 
Stationarity means that the difference in length between 7 and 7* is 
of order e 2 for small e. This is a generalization of the elementary cal- 
culus notion that a function has a zero derivative near an extremum 
or point of inflection. In our example on the surface of the earth, 
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the two geodesics connecting A and B are both stationary. 

Spacelike geodesics in special relativity are stationary by the 
above definition. However, this assertion may be misleading. Be- 
cause we construct the displacement as the product eh, its derivative 
is also guaranteed to shrink in proportion to e for small e. We could 
loosen this requirement a little bit, and only require that the mag- 
nitude of the displacement be of order e. In this case, one can show 
that spacelike curves are not stationary. For example, any spacelike 
curve can be approximated to an arbitrary degree of precision by 
a chain of lightlike geodesic segments. Thus an arbitrarily small 
perturbation in the curve reduces its length to zero. 

The situation becomes even worse for lightlike geodesics. Here 
we would have to define what “length” was. We could either take an 
absolute value, L = f yj \gijdx l dxi \ , or not, L = f y 1 gijdx l dxi . If we 
don’t take the absolute value, L need not be real for small variations 
of the geodesic, and therefore we don’t have a well-defined ordering, 
and can’t say whether L is a maximum, a minimum, or neither. 
Regardless of whether we take the absolute value, we have L = 0 
for a lightlike geodesic, but the square root function doesn’t have 
differentiable behavior when its argument is zero, so we don’t have 
stationarity. If we do take the absolute value, then for the geodesic 
curve, the length is zero, which is the shortest possible. However, 
one can have nongeodesic curves of zero length, such as a lightlike 
helical curve about the t axis. 


9.5 a Congruences, expansion, and rigidity 

This chapter has focused on fluxes of conserved quantities; we wanted 
to rule out pictures like 1/1, in which the appearance and disappear- 
ance of world-lines would imply nonconservation of properties such 
as charge and mass-energy. But the mathematical techniques we’ve 
developed turn out to be an elegant way to approach the different 
issues described in the other parts of figure 1. 

9.5.1 Congruences 

In 1/2, we have expansion. For example, the world-lines could 
represent galaxies getting farther apart because of cosmological ex- 
pansion resulting from the Big Bang. We do not expect rulers to 
expand or contract, in the sense that although a ruler may exhibit 
Lorentz contraction, it should always have the same length in its 
own rest frame unless it has been mechanically stressed or altered. 

If there is more than one spatial dimension, then we can have 
rotation , as in 1/3. These world-lines could represent a constellation 
of orbiting satellites, or fixed points in a rotating laboratory. 

The other interesting possibility, if there is more than one spatial 
dimension, is shear, figure 1/4. Here the rectangular group of four 



1/1. Nonconservation. 2. Ex- 
pansion. 3. Rotation. 4. Shear. 
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particles contracts in one direction while expanding in the other so 
as to keep the enclosed 2-volume constant. 

In order to discuss these possibilities, it is convenient to define 
the notion of a timelike congruence, which is a set of nonintersecting, 
smooth, timelike world-lines whose union constitutes all the events 
in some region of spacetime. That is, we “fill in” spacetime with 
an infinite number of world-lines so that there is no space between 
them. This is something like the grain in a piece of wood, or Fara- 
day’s conception of field lines filling space, except that one of our 
n + 1 dimensions is timelike, and the lines aren’t allowed to point 
in directions that lie outside the light cone. One way to specify a 
congruence is to give the normalized velocity vector that is tangent 
to the world-line passing through any given point. 

An expanding congruence Example 12 

As an example of a congruence in 1 + 1 dimensions, consider the 
set of all curves of the form x = ae bt , where a and b are positive 
constants. It would look like figure 1/2. Letting u = dx/d t = abe bt , 
the velocity vector is v A = y _1 (1 ,u), where the factor of y _1 = 
\/1 - u 2 gives the proper normalization v A v A = 1 . 

A boring congruence Example 13 

Suppose we instead let the congruence consist of the set of all 
curves of the form x = c + ut, where c and u are constants and 
\u\ < 1 . Then as in example 12, v A = y _1 (1 , u). The world-lines 
are inertial and parallel to one another. 

9.5.2 Expansion and rigidity 

For the remainder of this discussion, we restrict ourselves to the 
1 + 1-dimensional case, so that rotation and shear are impossible, 
and the only interesting question is whether a given congruence has 
expansion. In 1 + 1 dimensions, the congruence can be specified 
by giving the function u(x,t ), where as in examples 12 and 13, 
u = dx/ df. If u is constant, then we have example 13, and clearly 
there is no expansion. Thus expansion requires either du/dt or 
du/dx , or both, to be nonzero. 


m / 1. A congruence with du/dx 
equal to zero. 2. A congruence 
with du/dt = 0. 3. A congruence 
without expansion. 



Figure m/1 shows the case where du/dx = 0 and du/dt ^ 0. 
Each world-line is a copy of the others that has been shifted spatially, 


202 


Chapter 9 


Flux 





and the two velocity vectors shown as arrows are equal. This is 
precisely Bell’s spaceship paradox (section 3.9.2, p. 71). Although 
the horizontal spacing between the world-lines remains constant as 
defined by the fixed frame of reference used for the diagram, an 
observer accelerating along with one of the particles would find that 
they had expanded away from one another, because the observer’s 
meter-sticks have Lorentz-contracted. This is a real expansion in 
the sense that if the world-lines are particles in a solid object, the 
object comes under increasing tension. 

In m/2 we have du/dt = 0 and du/dx ^ 0. The world-lines are 
copies of one another that have been shifted temporally. The two 
velocity vectors in the diagram are the same. All of the particles 
began accelerating from the same point in space, but at different 
times. Here there is clearly an expansion, because the world-lines 
are getting farther apart. 

Suppose that we accelerate a rigid object such as a ruler. Then 
we must have something like m/3. To avoid the situations described 
in m/1 and m/2, the velocity vector must vary with both t and x ; the 
three velocity vectors in the figure are all different. As the particles 
accelerate, the spacing between them Lorentz-contracts, so that an 
observer accelerating along with them sees the spacing as remaining 
constant. 

This notion of rigid motion in relativity is called Born rigidity. 
No physical substance can naturally be perfectly rigid (Born rigid), 
for if it were, then the speed at which sound waves traveled in it 
would be greater than c. Born rigidity can only be accomplished 
through a set of external forces applied at all points on the object 
according to a program that has been planned in advance. A real 
object such as a ruler does not maintain its own Born-rigidity, but 
it will eventually return to its original size and shape after having 
undergone relativistic acceleration, due to its own elastic proper- 
ties, provided that the acceleration has been gentle enough to avoid 
permanently damaging it. In 1 + 1 dimensions, Born rigidity is equiv- 
alent to a lack of expansion. In 3 + 1 dimensions, we also require 
vanishing shear. 

Mathematically, it is clear that the condition of vanishing ex- 
pansion must be expressible in 1 + 1 dimensions in terms of the 
partial derivatives du/dt and du/dx , and since we have been able 
to describe the condition in a frame-independent way (by referring 
it to observations made by the comoving observer), it should also 
be something we can express as a scalar within the grammar of in- 
dex gymnastics. There is only one possible way to express such a 
condition, which is 

d a v a = 0. 

We can in fact define a scalar 0, called the expansion scalar, ac- 


Section 9.5 * Congruences, expansion, and rigidity 


203 





o/A caustic in the lines of 
simultaneity of the family of 
accelerated world-lines. 


cording to 

0 = d a v a . 

This definition is valid in n + 1 dimensions, but in 1 + 1 dimensions 
it reduces to 0 = d^/dt + d(wy)/dx. 

The expansion scalar is interpreted as the fractional rate of 
change in the volume of a set of particles that move along the world- 
lines defined by the congruence, where the rate of change is defined 
with respect to the proper time r of an observer moving along with 
the particles. For example, cosmological expansion leads to a frac- 
tional increase in the distances between galaxies A L/L which, for 
a small time interval At, is equal to H q At, where H 0 , called the 
Hubble constant, is about 2.3 x 10 -18 s -1 . That is, the fractional 
rate of change is (1 / L) d L/ dr = H 0 . Because distances expand in 
all three spatial dimensions, the fractional rate of change of volume 
is 0 = (1/H) dV / dr = 3 H 0 . (In this example, spacetime is not flat, 
so we would have to express 0 in terms of the covariant derivative 
V a defined in section 9.4, not the partial derivative d a ■) 

A catastrophe Example 14 

Consider the timelike congruence in 1 + 1 dimensions defined by 
u = x/t. This consists of the set of all inertial world-lines passing 
through the origin. Since our definition of a congruence requires 
that the world-lines be non-intersecting, let’s restrict this example 
to the interior of the past light cone of the origin, |x| < -t. We 
have a universe full of hapless particles, all heading like lemmings 
toward a catastrophic collision. The spacetime diagram looks like 
an optical ray diagram for the formation of a real image. A com- 
putation gives the unexpectedly simple result © = y/t. For t < 0, 
this is negative, indicating a contraction, and it blows up to minus 
infinity as t approaches 0. 

9.5.3 Caustics 

The apex of the cone in example 14 is a caustic. Given a space- 
filling set of straight lines, a caustic occurs where their intensity 
diverges to infinity. The word means “burning,” because in optics 
a caustic of light rays concentrates energy and can burn things. 
Example 14 involves a caustic of timelike world-lines, and “straight” 
is to be interpreted as meaning that the world-lines are inertial. 

Figure o shows two caustics formed by spacelike lines for the 
accelerated coordinate system described in 7.1. Here, as is often the 
case, the caustics are not just points. 

An example from general relativity is that when a black hole 
forms by gravitational collapse, a caustic is formed at a one point 
by the set of lightlike world-lines that enter the event horizon from 
the outside universe at the moment when the horizon is formed. If 
a ray of light is emitted from this caustic point, it remains on the 
event horizon forever, as do all rays emitted at the horizon in the 
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outward direction at later times. The event horizon is the same set 
of events as the union of all the lightlike world-lines that enter the 
horizon at the caustic. 4 

9.5.4 The Herglotz-Noether theorem in 1+1 dimensions 

Certain Born-rigid types of motion are possible, and others are 
not, purely as a matter of kinematics. It turns out to be possible to 
accelerate a rod in a Born-rigid way along its own length (problem 7, 
p. 214), but surprisingly, it is not possible, for example, for a sphere 
to remain Born-rigid while simultaneously rotating and having its 
center of mass accelerated. The possible types of motion are de- 
lineated by a theorem called the Herglotz-Noether theorem. Unlike 
the 3 + 1-dimensional version of the theorem, the 1 + 1-dimensional 
version is neither surprising nor difficult to state or prove. 

Herglotz-Noether theorem in 1+1 dimensions: Any rigid motion 
in 1 + 1 dimensions is uniquely determined by the world-line W of 
one point, provided that the world-line of that point is smooth and 
timelike. It is in general only possible to extend the congruence 
describing the motion to some neighborhood of W. 

Proof: To avoid technical issues, we assume that “smooth” means 
analytic, which slightly weakens the result. As discussed above, zero 
expansion is equivalent to 0 = d'y/dt + d(;wy)/dx, where (t,x) are 
any set of Minkowski coordinates. This can be put in the form 
du/dx = f(u)du/dt, where / is smooth for — 1 < u < 1 and 
/( 0) = 0. We need to prove that the solution of this partial dif- 
ferential equation, if it exists, is unique given W. We arbitrarily 
choose one event on W. By assumption, W is timelike at this point, 
so we are free to choose our Minkowski coordinates such that our 
point is at rest at this event at the origin. Since /( 0) = 0, it follows 
that at the origin du/dx = 0. We can similarly evaluate the higher 
derivatives d n u/dx n , and because u is smooth we can in this calcula- 
tion freely interchange the order of the partial derivatives d x and dt- 
It is straightforward to show that these higher derivatives d n u/dx n 
are also zero. Since u(x) is assumed to be analytic, it follows that 
it(0, x) = 0 for all x, i.e., an observer instantaneously moving along 
W at t = 0 says that all other points are at rest as well at that time. 
But because W is timelike, we can always find some neighborhood A 
of W such that every point in A is simultaneous with a unique event 
on W according to an observer at that event moving along with W. 
(Cf. p. 73.) Therefore the value of u is determined everywhere in 
A, and this completes the proof that the congruence exists and is 
unique in A. 

Remarks: (1) The 1 + 1-dimensional version of the Herglotz- 
Noether theorem is not a special case of the 3 + 1-dimensional ver- 


4 Penrose, 1968. The proof is presented in Misner, Thorne, and Wheeler, p. 
924 
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sion. The latter is usually proved for a space-filling congruence, and 
it fails when the body in question does not enclose a volume, e.g., 
in the case of a thin rod or a letter “C.” 

(2) The theorem can be strengthened by relaxing the require- 
ment of smoothness so that only the existence of a second derivative 
of the position with respect to proper time is required. 5 

(3) If the motion is accelerated, then the rigid motion cannot be 
extended to an arbitrary distance from W. If the proper acceleration 
of W can be as great as a, then as in example ??, p. ??, we expect 
to be able to extend the rigid motion to a proper distance only as 
big as c 2 /a, where there will be a caustic similar to the one in figure 

o. 

9.5.5 Bell’s spaceship paradox revisited 

Bell’s spaceship paradox was discussed in section 3.9.2 on p. 71. 
In the paradox, two spaceships begin accelerating simultaneously 
and have equal accelerations in the frame of an external, inertial 
observer, causing a thread stretched between them breaks. We now 
give a more rigorous and mathematically elegant demonstration of 
the same result, suggested by P. Allen. 

The motion of the thread throughout its length can be described 
by a timelike congruence. If the thread is not to come under any 
strain, then this must be a Born-rigid congruence. By the 1 + 1- 
dimensional Herglotz-Noether theorem, the congruence is uniquely 
determined by the motion of one of its points, which we take to be 
the trailing rocket. This congruence happens to be known. It is de- 
fined by the system of accelerated coordinates (Rindler coordinates) 
described in section 7.1, p. 143. The vanishing of the expansion 
scalar for this congruence is left for the reader to verify (problem 7, 

p. 214). But this congruence consists of world-lines whose proper 
accelerations are each constant and all different from one another, 
and this is inconsistent with the description given in the Bell para- 
dox, where it is stated that a frame exists in which the motions of 
the two ships are identical except for a translation. Therefore the 
thread cannot move rigidly. 

This completes the resolution fo the paradox, but as an illus- 
trative example, we present an explicit calculation of the expansion 
scalar for the congruence that one would most naturally imagine 
to be implied by the description of the paradox. This is given by 
(x + c) 2 = 1 + 1 2 . For a given value of the parameter c, we get an 
accelerating world-line. (Its proper acceleration a = 1 happens to 
be constant, example 4, p. 61, although this is not necessary for the 
purposes of discussing the paradox.) Each world-line starts at rest 
at t = 0, and each one has the same acceleration at any given t. By 

s Giulini, “The Rich Structure of Minkowski Space,” arxiv . org/abs/0802 . 
4345, theorems 18 and 22 
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picking any two distinct values of c as the endpoints of the thread, 
we obtain the literal situation described in the paradox. 

Implicit differentiation gives u = t/ y/l + 1 2 . The algebra gets a 
little messy now, so I used the open-source computer algebra system 
Maxima. The following program, which should be fairly readable 
without previous knowledge of Maxima’s syntax, calculates the ex- 
pansion tensor: 

1 u:t/sqrt(l+t"2) ; 

2 gamma: 1/sqrt (l-u~2) ; 

3 theta:diff (gamma, t)+diff (u*gamma,x) ; 

4 is (equal (theta, gamma*u"2/t) ) ; 

The third line prints out a complicated expression for 0, which 
the fourth line shows can be simplified to 7 u 2 /t. This is positive for 
t > 0, which shows that the thread is forced to expand. Note that 
although the calculation was carried out in a particular set of coor- 
dinates, a relativistic scalar such as 0 has a coordinate-independent 
value. Reference to a particular coordinate system or frame of ref- 
erence occurs only in the initial definition of the congruence, which 
is defined in order to model the situation described in the paradox, 
which is stated in terms of a particular external observer. 

9.6 Units of measurement for tensors 

Analyzing units, also known as dimensional analysis, is one of the 
first things we learn in freshman physics. It’s a useful way of check- 
ing our math, and it seems as though it ought to be straightforward 
to extend the technique to relativity. It certainly can be done, but 
it isn’t quite as trivial as might be imagined. We’ll see below that 
different authors prefer differing systems, and clashes occur between 
some of the notational systems in use. 

One of our most common jobs is to change from one set of units 
to another, but in relativity it becomes nontrivial to define what we 
mean by the notion that our units of measurement change or don’t 
change. We could, e.g., appeal to an atomic standard, but Dicke 6 
points out that this could be problematic. Imagine, he says, that 

you are told by a space traveller that a hydrogen atom on 
Sirius has the same diameter as one on the earth. A few 
moments’ thought will convince you that the statement 
is either a definition or else meaningless. 

(Some related ideas about the numerical value of c were discussed 
on p. 19.) 

6 “Mach’s principle and invariance under transformation of units,” Phys Rev 
125 (1962) 2163 
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To start with, we note that abstract index notation is more con- 
venient than concrete index notation for these purposes. As noted 
in section 7.5, p. 150, concrete index notation assigns different units 
to different components of a tensor if we use coordinates, such as 
spherical coordinates ( t,r,0,q i>), that don’t all have units of length. 
In abstract index notation, a symbol like v l stands for the whole 
vector, not for one of its components. Since abstract index nota- 
tion does not even offer us a notation for components, if we want to 
apply dimensional analysis we must define a system in which units 
are attributed to a tensor as a whole. Suppose we write down the 
abstract-index form of the equation for proper time: 

ds 2 = Qab dx“ dx a 

In abstract index notation, dx a doesn’t mean an infinitesimal change 
in a particular coordinate, it means an infinitesimal displacement 
vector.' This equation has one quantity on the left and three fac- 
tors on the right. Suppose we assign these parts of the equation 
units [ds] = L a , [g a b] = T 27 , and [dx a ] = [ dx b ] = L £, where square 
brackets mean “the units of” and L stands for units of length. We 
then have a = 7 + £. Due to the ambiguities referred to above, we 
can pick any values we like for these three constants, as long as they 
obey this rule. I find (a, 7, £) = ( 1 , 0 , 1 ) to be natural and conve- 
nient, but Dicke, in the above-referenced paper, likes (1, 1,0), while 
the mathematician Terry Tao advocates (0,^1, ±1). 

Suppose we raise and lower indices to form a tensor with r upper 
indices and s lower indices We refer to this as a tensor of rank 
(r, s). (We don’t count contracted indices, e.g., u a v a is a rank-(0, 0) 
scalar.) Since the metric is the tool we use for raising and lowering 
indices, and the units of the lower-index form of the metric are L 27 , 
it follows that the units vary in proportion to L' y ( s ~ r \ In general, 
you can assign a physical quantity units L u that are a product of 
two factors, a “kinematical” or purely geometrical factor L k , where 
k = 7 (s — r), and a dynamical factor L d . . ., which can depend on 
what kind of quantity it is, and where the . . . indicates that if your 
system of units has more than just one base unit, those can be in 
there as well. Dicke uses units with h = c = 1, for example, so 
there is only one base unit, and mass has units of inverse length and 
dmnss = — I- In general relativity it would be more common to use 
units in which G = c = 1, which instead give d mass = +1. 

The units of momentum Example 15 

Consider the equation 

p a = mv a 

for the momentum of a material particle. Suppose we use special- 
relativistic units in which c = 1, but because gravity isn’t incorpo- 

1 For a modern and rigorous development of differential geometry along these 

lines, see Nowik and Katz, arxiv. org/abs/1405 . 0984. 


208 


Chapter 9 


Flux 



rated into the theory, G plays no special role, and it is natural to 
use a system of units in which there is a base unit of mass M. 

The kinematic units check out, because k p = k m + k v : 
y(— i) = y(0)+y(— i) 

This is merely a matter of counting indices, and was guaranteed 
to check out as long as the indices were written in a grammatical 
way on both sides of the equation. What this check is essentially 
telling us is that if we were to establish Minkowski coordinates in 
a neighborhood of some point, and do a change of coordinates 
( t,x,y,z ) ->• (ott,otx,oty,ctz), then the quantities on both sides 
of the equation would vary under the tensor transformation laws 
according to the same exponent of a. For example, if we changed 
from meters to centimeters, the equation would still remain valid. 

For the dynamical units, suppose that we use (cr,Y, Q = (1,0,1), 
so that an infinitesimal displacement dx a has units of length L, as 
does proper time ds. These two quantities are purely kinematic, 
so we don’t assign them any dynamical units, and therefore the 
velocity vector v a = dx a / ds also has no dynamical units. Our 
choice of a system of units gives [m] = M. We require that the 
equation p a = mv a have dynamical units that check out, so: 

M= 1 • M 

We must also assign units of mass to the momentum. 

A system almost identical to this one, but with different termi- 
nology, is given by Schouten. 8 

For practical purposes in checking the units of an equation, we 
can see from example 15 that worrying about the kinematic units 
is a waste of time as long as we have checked that the indices are 
grammatical. We can therefore give a simplified method that suffices 
for checking the units of any equation in abstract index notation. 

1. We assign a tensor the same units that one of its concrete 
components would have if we were to adopt (local) Minkowski 
coordinates, in the system with (c, 7, £) = (1,0,1). These 
are the units we would automatically have imputed to it after 
learning special relativity but before learning about tensors or 
fancy coordinate transformations. Since 7 = 0, the positions 
of the indices do not affect the result. 

2. The units of a sum are the same as the units of the terms. 

3. The units of a tensor product are the product of the units of 
the factors. 


s Tensor Analysis for Physicists, ch. VI 
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9.7 ★ Notations for tensors 

Johnny is an American grade-school kid who has had his tender 
mind protected from certain historical realities, such as the political 
status of slaves, women, and Native Americans in the early United 
States. If Johnny ever tries to read the U.S. Constitution, he will be 
very confused by certain passages, such as the infamous three-fifths 
clause referring opaquely to “all other persons.” 

This optional section is meant to expose you to some similar his- 
torical ugliness involving tensor notation, knowledge of which may 
be helpful if you learn general relativity in the future. As in the 
evolution of the U.S. Constitution and its interpretation, we will 
find that not all the changes have been improvements. In sections 
9. 7. 1-9. 7. 2 we briefly recapitulate some notations that have already 
been introduced, and then in sections 9. 7. 3-9. 7. 4 we introduce two 
new ones. 

9.7.1 Concrete index notation 

A displacement vector is our prototypical example of a tensor, 
and the original nineteenth-century approach was to associate this 
tensor with the changes in the coordinates. Tensors achieve their full 
importance in differential geometry, where space (or spacetime, in 
general relativity) may be curved, in the sense defined in section 2.2, 
p. 45. In this context, only infinitesimally small displacements qual- 
ify as vectors; to see this, imagine displacements on a sphere, which 
do not commute for the reasons described in section 8.3.1, p. 170. 
On small scales, the sphere’s curvature is not apparent, which is 
why we need to make our displacements infinitesimal. Thus in this 
approach, the simplest example of a relativistic tensor occurs if we 
pick Minkowski coordinates to describe a region of spacetime that 
is small enough for the curvature to be negligible, and we associate 
a displacement vector with a 4-tuple of infinitesimal changes in the 
coordinates: 

(df, dx , d y, d z) 

Until about 1960, this carried the taint of the lack of rigor believed 
to be associated with Leibniz-style infinitesimal numbers, but this 
difficulty was resolved and is no longer an argument against the 
notation. 9 

9.7.2 Coordinate-independent notation 

A more valid reason for disliking the old-school notation is that, 
as described in ch. 7, p. 143, it is desirable to avoid writing every 
line of mathematics in a notation that explicitly refers to a choice 
of coordinates. We might therefore prefer, as Penrose began advo- 
cating around 1970, to notate this vector in coordinate-independent 

9 For a thorough development of the “back-to-the- future” use of infinitesimals 
for this purpose, see Nowik and Katz, arxiv . org/abs/1405 . 0984. 
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notation such as “birdtracks” (section 6.1.3, p. 126) 


►dx, 

or the synonymous abstract index notation (section ??, p. ??), 

dx a , 

where the use of the Latin letter a means that we’re not referring to 
any coordinate system, a doesn’t take on values such as 1 or 2, and 
dx a refers to the entire object -*-dx, not to some real number or set 
of real numbers. 

Unfortunately for the struggling student of relativity, there are 
at least two more notations now in use, both of them incompatible 
in various ways with the ones we’ve encountered so far. 

9.7.3 Cartan notation 

Our notation involving upper and lower indices is descended 
from a similar- looking one invented in 1853 by Sylvester. 10 In this 
system, vectors are thought of as invariant quantities. We write a 
vector in terms of a basis {e A ,} as x = ]C x^e,, . Since x is considered 
invariant, it follows that the components x ^ and the basis vectors 
must transform in opposite ways. For example, if we convert from 
meters to centimeters, the x /J " get a hundred times bigger, which is 
compensated for by a corresponding shrinking of the basis vectors 
by 1/100. 

This notation clashes with normal index notation in certain 
ways. One gotcha is that we can’t infer the rank of an expres- 
sion by counting indices. For example, x = )C is notated as if 
it were a scalar, but this is actually a notation for a vector. 

Circa 1930, Elie Cartan augmented this notation with a trick 
that is perhaps a little too cute for its own good. He noted that the 
partial differentiation operators d/dx ^ could be used as a basis for 
a vector space whose structure is the same as the space of ordinary 
vectors. In the modern context we rewrite the operator d/dx^ as d ^ 
and use the Einstein summation convention, so that in the Cartan 
notation we express a vector in terms of its components as 

x = x^d^. 

In the Cartan notation, the symbol dx ,M is hijacked in order to 
represent something completely different than it normally does; it’s 
taken to mean the dual vector corresponding to <9 /t . The set {dx M } 
is used as a basis for notating covectors. 

A further problem with the Cartan notation arises when we try 
to use it for dimensional analysis (see section 9.7.5). 

10 An easily obtainable modern description is given in Coxeter, Introduction to 
Geometry. 
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9.7.4 Index-free notation 


Independently of Penrose and the physics community, mathe- 
maticians invented a different coordinate-free notation, one without 
indices. In this notation, for example, we would notate the magni- 
tude of a vector not as v a v b or g a bV a v b but as 

0 (v, v). 

This notation is too clumsy for use in complicated expressions in- 
volving tensors with many indices. As shown in section 9.7.5, it is 
also not compatible with the way physicists are accustomed to doing 
dimensional analysis. 

9.7.5 Incompatibility of Cartan and index-free notation with 
dimensional analysis 

In section 9.6 we developed a system of dimensional analysis for 
use with abstract index notation. Here we discuss the issues that 
arise when we attempt to mix in other notational systems. 

One of the hallmarks of index-free notation is that it uses non- 
multiplicative notation for many tensor products that would have 
been written as multiplication in index notation, e.g., g(v, v) rather 
than v a v a . This makes the system clumsy to use for dimensional 
analysis, since we are accustomed to reasoning about units based on 
the assumption that the units of any term in an equation equal the 
product of the units of its factors. 

In Cartan notation we have the problem that certain notations, 
such as dx^, are completely redefined. The remainder of this sec- 
tion is devoted to exploring what goes wrong when we attempt to 
extend the analysis of section 9.6 to include Cartan notation. Let 
vector r and covector lo be duals of each other, and let r represent 
a displacement. In Cartan notation, we write these vectors in terms 
of their components, in some coordinate system, as follows: 

r = r^dfj, (5) 

u = t ( 6 ) 

Suppose that the coordinates are Minkowski. Reading from left to 
right and from top to bottom, there are six quantities occurring in 
these equations. We attribute to them the units L A , L B , . . . L F . If 
we follow the rule that multiplicative notation is to imply multipli- 
cation of units, then 

A = B + C and (7) 

D = E + F. (8) 

For compatibility with the system in section 9.6, equations 5-6 re- 
quire 

A + D = 2cr and (9) 

D = 2j + B. (10) 
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To avoid a clash between Cartan and concrete index notation in 
a Minkowski coordinate system, it would appear that we want the 
following three additional conditions. 


F = £ units of Cartan dx M not to clash with units of dx ^ 

( 11 ) 

C = — £ units of Cartan d ^ not to clash with units of the derivative 

(12) 

B = £ units of components in Cartan notation not to clash with units of dx* 1 

(13) 

We have 6 unknowns and 7 constraints, so in general Cartan no- 
tation cannot be incorporated into this system without some con- 
straint on the exponents (<t, 7 ,£). In particular, we require £ = 0, 
which is not a choice that most physicists prefer. 
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Problems 


1 Rewrite the stress-energy tensor of a perfect fluid in SI units. 
For air at sea level, compare the sizes of its components. 

2 Prove by direct computation that if a rank-2 tensor is sym- 
metric when expressed in one Minkowski frame, the symmetry is 
preserved under a boost. 

3 Consider the following change of coordinates: 

t' = -t 
x' = x 

v =y 

z' = z 

This is called a time reversal. As in example 6 on p. 181, find the 
effect on the stress-energy tensor. 

4 Show that in Minkowski coordinates in flat spacetime, all 
Christoffel symbols vanish. 

5 Show that if the differential equation for geodesics on page 
199 is satisfied for one affine parameter A, then it is also satisfied for 
any other affine parameter X' = aX + b, where a and b are constants. 

6 This problem investigates a notational conflict in the de- 
scription of the metric tensor using index notation. Suppose that 
we have two different metrics, g and g' . The difference of two 
rank- 2 tensors is also a rank-2 tensor, so we would like the quantity 
5g^ u = g[ a, — g/iv to be a well-behaved tensor both in its transforma- 
tion properties and in its behavior when we manipulate its indices. 
Now we also have and g'^ v , which are defined as the matrix 
inverses of their lower-index counterparts; this is a special property 
of the metric, not of rank-2 tensors in general. We can then define 
Sg 1 ' 1 ' = g'L w — g ,iv . (a) Use a simple example to show that 5g^ u 
and bg cannot be computed from one another in the usual way 
by raising and lowering indices, (b) Find the general relationship 
between bg llv and bg ,iv . 

7 In section 9.5.5 on p. 206, we analyzed the Bell spaceship 
paradox using the expansion scalar and the Herglotz-Noether the- 
orem. Suppose that we carry out a similar analysis, but with the 
congruence defined by x 2 — t 2 = a~ 2 . The motivation for consid- 
ering this congruence is that its world-lines have constant proper 
acceleration a, and each such world-line has a constant value of the 
coordinate X in the system of accelerated coordinates (Rindler co- 
ordinates) described in section 7.1, p. 143. Show that the expansion 
tensor vanishes. The interpretation is that it is possible to apply 
a carefully planned set of external forces to a straight rod so that 
it accelerates along its own length without any stress, i.e. , while 
remaining Born-rigid. 
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Chapter 10 

Electromagnetism 


10.1 Relativity requires magnetism 

Figure a/1 is an unrealistic model of charged particle moving par- 
allel to a current-carrying wire. What electrical force does the lone 
particle in figure a/1 feel? Since the density of “traffic” on the two 
sides of the “road” is equal, there is zero overall electrical force on 
the lone particle. Each “car” that attracts the lone particle is paired 
with a partner on the other side of the road that repels it. If we 
didn’t know about magnetism, we’d think this was the whole story: 
the lone particle feels no force at all from the wire. 

Figure a/2 shows what we’d see if we were observing all this from 
a frame of reference moving along with the lone charge. Relativity 
tells us that moving objects appear contracted to an observer who is 
not moving along with them. Both lines of charge are in motion in 
both frames of reference, but in frame 1 they were moving at equal 
speeds, so their contractions were equal. In frame 2, however, their 
speeds are unequal. The dark charges are moving more slowly than 
in frame 1, so in frame 2 they are less contracted. The light-colored 
charges are moving more quickly, so their contraction is greater now. 
The “cars” on the two sides of the “road” are no longer paired off, 
so the electrical forces on the lone particle no longer cancel out as 
they did in a/1. The lone particle is attracted to the wire, because 
the particles attracting it are more dense than the ones repelling it. 

Now observers in frames 1 and 2 disagree about many things, 
but they do agree on concrete events. Observer 2 is going to see 
the lone particle drift toward the wire due to the wire’s electrical 
attraction, gradually speeding up, and eventually hit the wire. If 2 
sees this collision, then 1 must as well. But 1 knows that the total 
electrical force on the lone particle is exactly zero. There must be 
some new type of force. She invents a name for it: magnetism. 

Magnetism is a purely relativistic effect. Since relativistic ef- 
fects are down by a factor of v 2 compared to Newtonian ones, it’s 
surprising that relativity can produce an effect as vigorous as the at- 
traction between a magnet and your refrigerator. The explanation 
is that although matter is electrically neutral, the cancellation of 
electrical forces between macroscopic objects is extremely delicate, 
so anything that throws off the cancellation, even slightly, leads to 
a surprisingly large force. 


1 G — ► 


GGGGGGGG 

ttttcctc 


2 e 


6 6 6 6 6 6 6 

t l t t t t t t t t l t t t t t 


a / A model of a charged particle 
and a current-carrying wire, 
seen in two different frames of 
reference. The relativistic length 
contraction is highly exaggerated. 
The force on the lone particle is 
purely magnetic in 1, and purely 
electric in 2. 
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b / Fields carry energy. 


10.2 Fields in relativity 

Based on what we learned in section 10.1, the next natural step 
would seem to be to find some way of extending Coulomb’s law to 
include magnetism. For example, we could try to find a formula 
for the magnetic force between charges q\ and c /2 based on not just 
their relative positions but also on their velocities. The following 
considerations, however, tell us not to go down that path. 

10.2.1 Time delays in forces exerted at a distance 

Relativity forbids Newton’s instantaneous action at a distance 
(p. 17). Since forces can’t be transmitted instantaneously, it be- 
comes natural to imagine force-effects spreading outward from their 
source like ripples on a pond, and we then have no choice but to 
impute some physical reality to these ripples. We call them fields, 
and they have their own independent existence. 

Even empty space, then, is not perfectly featureless. It has mea- 
surable properties. For example, we can drop a rock in order to 
measure the direction of the gravitational held, or use a magnetic 
compass to find the direction of the magnetic held. This concept 
made a deep impression on Einstein as a child. He recalled that as 
a hve-year-old, the gift of a magnetic compass convinced him that 
there was “something behind things, something deeply hidden.” 

10.2.2 Fields carry energy. 

The smoking-gun argument for this strange notion of traveling 
force ripples comes from the fact that they carry energy. In fig- 
ure b/1, Alice and Betty hold positive charges A and B at some 
distance from one another. If Alice chooses to move her charge 
closer to Betty’s, b/2, Alice will have to do some mechanical work 
against the electrical repulsion, burning off some of the calories from 
that chocolate cheesecake she had at lunch. This reduction in her 
body’s chemical energy is offset by a corresponding increase in the 
electrical potential energy qAV. Not only that, but Alice feels the 
resistance stiffen as the charges get closer together and the repul- 
sion strengthens. She has to do a little extra work, but this is all 
properly accounted for in the electrical potential energy. 

But now suppose, b/3, that Betty decides to play a trick on Alice 
by tossing charge B far away just as Alice is getting ready to move 
charge A. We have already established that Alice can’t feel charge 
B’s motion instantaneously, so the electric forces must actually be 
propagated by an electric field. Of course this experiment is utterly 
impractical, but suppose for the sake of argument that the time it 
takes the change in the electric field to propagate across the dia- 
gram is long enough so that Alice can complete her motion before 
she feels the effect of B’s disappearance. She is still getting stale 
information about B’s position. As she moves A to the right, she 
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feels a repulsion, because the field in her region of space is still the 
held caused by B in its old position. She has burned some chocolate 
cheesecake calories, and it appears that conservation of energy has 
been violated, because these calories can’t be properly accounted 
for by any interaction with B, which is long gone. 

If we hope to preserve the law of conservation of energy, then 
the only possible conclusion is that the electric held itself carries 
away the cheesecake energy. In fact, this example represents an 
impractical method of transmitting radio waves. Alice does work 
on charge A, and that energy goes into the radio waves. Even if B 
had never existed, the radio waves would still have carried energy, 
and Alice would still have had to do work in order to create them. 

10.2.3 Fields must have transformation laws 

In the foregoing discussion I’ve been guilty of making arguments 
that helds were “real.” Sorry. In physics, and particularly in rel- 
ativity, it’s usually a waste of time worrying about whether some 
effect such as length contraction is “real” or only “seems that way.” 
But thinking of helds as having an independent existence does lead 
to a useful guiding principle, which is that fields must have trans- 
formation laws. Suppose that at a certain location, observer oi 
measures every possible held — electric, magnetic, bodice-ripper- 
sexual-attractional, and so on. (The gravitational held is not on the 
list, for the reasons discussed in section 5 . 2 .) Observer 02, passing 
by the same event but in a different state of motion, could carry 
out similar measurements. We’re talking about measurements be- 
ing carried out on a cubic inch of pure vacuum, but suppose that the 
answer to Peggy Lee’s famous question is “Yes, that’s all there is” 
— the only information there is to know about that empty parcel of 
nothingness is the (frame-dependent) value of the helds it contains. 
Then 01 ought to be able to predict the results of 02’s measurements. 
For if not, then what is the nature of the information that is hidden 
from 01 but revealed to 02? Presumably this would be something 
related to how the helds were produced by certain particles long ago 
and far away. For example, maybe 01 is at rest relative to a certain 
charge q that helped to create the helds, but 02 isn’t, so 02 picks 
up q's magnetic held, which is information unavailable to 01 — who 
thinks q was at rest, and therefore didn’t make any magnetic held. 
This would contradict our “that’s all there is” hypothesis. 

To show the power of “that’s all there is,” consider example 1 , 
p. 176 , in which we found that boosting a solenoid along its own axis 
doesn’t change its internal held. As a fact about solenoids, it’s fairly 
obscure and useless. But if the helds must have transformation laws, 
then we’ve learned something much more general: a magnetic held 
always stays the same under a boost in the direction of the held. 
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10.3 Electromagnetic fields 

10.3.1 The electric field 

Section 10.1 showed that relativity requires magnetic forces to 
exist, and section 10.2.3 gave us a peek at what this implies about or 
how electric and magnetic fields transform. To understand this on a 
more general basis, let’s explicitly list some assumptions about the 
electric field and see how they lead to the existence and properties 
of a magnetic field: 

1. Definition of the electric field: In the frame of reference of an 
inertial observer o, take some standard, charged test particle, 
release it at rest, and observe the force F 0 (section 4.5, p. 100) 
acting on the particle. (The timelike component of this force 
vanishes.) Then the electric field three- vector E in frame o is 
defined by F 0 = qE, where we fix our system of units by taking 
some arbitrary value for the charge q of the test particle. 

2. Definition of electric charge: For charges other than the stan- 
dard test charge, we take Gauss’s law to be our definition of 
electric charge. 

3. Charge is Lorentz invariant (p. 22). 

4. Fields must have transformation laws (section 10.2.3). 

Many times already in our study of relativity, we’ve followed 
the strategy of taking a Galilean vector and trying to redefine it 
as a four-dimensional vector in relativity. Let’s try to do this with 
the electric field. Then we would have no other obvious thing to 
try than to change its definition to F = qE, where F = ma is the 
relativistic force vector (section 4.5, p. 100), so that the electric field 
three- vector was just the spacelike part of E. Because a • v = 0 for 
a material particle, this would imply that E was orthogonal to o 
for any observer o. But this is impossible, since then a spacetime 
displacement vector s along the direction of E would be a vector of 
simultaneity for all observers, and we know that this isn’t possible 
in relativity. 

10.3.2 The magnetic field 

Our situation is very similar to the one encountered in section 
9.1, p. 175, where we found that knowledge of the charge density 
in one frame was insufficient to tell us the charge density in other 
frames. There was missing information, which turned out to be 
the current density. The problems we’ve encountered in defining 
the transformation properties of the electric field suggest a similar 
“missing-information” situation, and it seems likely that the miss- 
ing information is the magnetic field. How should we modify the 
assumptions on p. 218 to allow for the existence of a magnetic field 
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in addition to the electric one? What properties could this addi- 
tional field have? How would we define or measure it? 

One way of imagining a new type of field would be if, in addition 
to charge q, particles had some other characteristic, call it r, and 
there was then be some entirely separate field defined by their action 
on a particle with this “r-ness.” But going down this road leads us 
to unrelated phenomena such as the the strong nuclear interaction. 

10.3.3 The electromagnetic field tensor 

The nature of the contradiction arrived at in section 10.3.1 is 
such that our additional field is closely linked to the electric one, 
and therefore we expect it to act on charge, not on r-ness. With- 
out inventing something new like r-ness, the only other available 
property of the test particle is its state of motion, characterized by 
its velocity vector v. Now the simplest rule we could imagine for 
determining the force on a test particle would be a linear one, which 
would look like matrix multiplication: 

F = qFv 


or in index notation, 

F a = qP a b v b . 

Although the form T a b with one upper and one lower index occurs 
naturally in this expression, we’ll find it more convenient from now 
on to work with the upper- upper form J 7ab . T would be 4 x 4, so it 
would have 16 elements: 


fp t 

p x 

jriy 

P tz \ 

T xt 

jrxx 

jxy 

P xz 

jryi 

jzyx 

jryy 

T yz 

\p zt 

T ZX 

T zy 

r zz ) 


Presumably these 16 numbers would encode the information about 
the electric field, as well as some additional information about the 
field or fields we were missing. 

But these are not 16 numbers that we can choose freely and inde- 
pendently. For example, consider a charged particle that is instan- 
taneously at rest in a certain observer’s frame, with v = (1, 0, 0, 0). 
(In this situation, the four-force equals the force measured by the 
observer.) The work done by a force is positive if the force is in the 
same direction as the motion, negative if in the opposite direction, 
and zero if there is no motion. Therefore the power P = d W j dt 
in this example should be zero. Power is the timelike component of 
the force vector, which forces us to take P n = 0. 

More generally, consider the kinematical constraint a • v = 0 
(p. 64). When we require a • v = 0 for any v, not just this one, we 
end up with the constraint that T must be antisymmetric, meaning 
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that when we transpose it, the result is another matrix that looks 
just like the original one, but with all the signs flipped: 


( 0 

jrt x 

F ty 

P z \ 

-F tx 

0 

jrxy 

jrxz 

-F ty 

_jrxy 

0 

j:yz 

\-F tz 

JTXZ 

—F yz 

0 ) 


Each element equals minus the corresponding element across the 
main diagonal from it, and antisymmetry also requires that the main 
diagonal itself be zero. In terms of the concept of degrees of free- 
dom introduced in section 3.5.3, p. 62, we are down to 6 degrees of 
freedom rather than 16. 

We now relabel the elements of the matrix and follow up with 
a justification of the relabeling. The result is the following rank- 2 
tensor: 


fo 

—E x 

Ey 

~E Z \ 

B x 

0 

~B Z 

By 

Ey 

B z 

0 

~B X 

\E Z 

— By 

B x 

0 / 


We’ll call this the electromagnetic field tensor. The labeling of the 
left column simply expresses the definition of the electric field, which 
is expressed in terms of the velocity v = (1,0, 0,0) of a particle at 
rest. The top row then follows from antisymmetry. For an arbitrary 
velocity vector, writing out the matrix multiplication F y = qT^ v v v 
results in expressions such as F x = 7 q(E x +u y B z —u z B y ) (problem 3, 
p. 237). Taking into account the difference of a factor of 7 between 
the four-force and the force measured by an observer, we end up 
with the familiar Lorentz force law, 

F 0 = </(E + u x B), 

where B is the magnetic field. This is expressed in units where c = 1, 
so that the electric and magnetic field have the same units. In units 
with c / 1 , the magnetic components of the electromagnetic field 
matrix should be multiplied by c. 

Thus starting only from the assumptions on p. 218, we deduce 
that the electric field must be accompanied by a magnetic field. 

Parity properties of E and B Example 1 

In example 6 on p. 181, we saw that under the parity transfor- 
mation (f, x, y, z) (f, -x, -y, -z), any rank-2 tensor expressed 
in Minkowski coordinates changes the signs of its components 
according to the same rule: 


/no flip 

flip 

flip 

flip \ 

flip 

no flip 

no flip 

no flip 

flip 

no flip 

no flip 

no flip 

V flip 

no flip 

no flip 

no flip/ 
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Since this holds for the electromagnetic field tensor J we find 
that under parity, E ->• -E and B —> B. For example, a capacitor 
seen in a mirror has its electric field pointing the opposite way, but 
there is no change in the magnetic field of a current loop, since 
the location of each current element is flipped to the other side 
of the loop, but its direction of flow is also reversed, so that the 
picture as a whole remains unchanged. 

10.3.4 What about gravity? 

A funny puzzle pops up if we go back and think about the as- 
sumptions on p. 218 that went into all this. Those assumptions were 
so general that it almost seems as though the only possible behavior 
for fields is the behavior of electric and magnetic fields. But other 
fields do behave differently. How did the assumptions fail in the 
case of gravity, for example? Gauss’s law (assumption 2) certainly 
holds for gravity. But the source of gravitational fields isn’t charge, 
it’s mass-energy, and mass-energy isn’t a Lorentz invariant, contrary 
to assumption 3. Furthermore, assumption 1 entailed that our field 
could be defined in terms of forces measured by an inertial observer, 
but for an inertial observer gravity doesn’t exist (section 5.2). 

10.4 Transformation of the fields 

Since we have associated the components of the electric and mag- 
netic fields with elements of a rank-2 tensor, the transformation law 
for these fields now follows from the general tensor transformation 
law for rank-2 tensors (p. 180). We first state the general rule, in 
a prettified form, and then give some concrete examples. Under a 
boost by a three-velocity v, the electric and magnetic fields E and 
B transform to E 7 and B 7 according to these rules: 

Ej| = E|| E 7 j_ = y(E_L + v x B) 

Bj| = B || B(l = 7(Bj_ — v x E) 


A line of charge Example 2 

Figure c/1 shows a line of charges. At a given nearby point, it 
creates an electric field E that points outward, as measured by 
an observer o who is at rest relative to the charges. This field is 
represented in the figure by its pattern of field lines, which start 
on the charges and radiate outward like the bristles of a bottle 
brush. Because the charges are at rest, the magnetic field is 
zero. (Finding the magnitude of the field at a certain distance is a 
straightforward application of Gauss’s law.) 

Now consider an observer o 7 , figure c/2, moving at velocity v to 
the right relative to o. Without even worrying about how the field 
was created, we can transform the fields, at the point in space 
discussed previously, into the new frame. The result is E 7 = yE 



c / Example 2. 
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and B' ± = -yv x E. In this frame, the electric field is more in- 
tense, and there is also a magnetic field, whose pattern of white 
field lines forms circles lying in planes perpendicular to the line. 
If we do happen to know that the field was created created by 
the line of charge, which is moving according to o', then we can 
explain these results as arising from two effects. First, the line of 
charge has been length-contracted. This causes the density of 
charge per unit length to increase by a factor of y, with a propor- 
tional increase in the electric field. In the field-line description, we 
simply have more charges in the figure, so there are more field 
lines coming out of them. Second, the line of charge is moving 
to the left in this frame, so it forms an electric current, and this 
current is the cause of the magnetic field B'. 

A moving charge Example 3 

Figure d/1 shows the electric field lines of a charge, in the charge’s 
rest frame K. In figure d/2 we see the same electric field, in a 
frame K' in which the charge is moving along the x axis, which 
points to the right, at 90% of c. (In this frame there is also a 
magnetic field, which is not shown.) This electric field, which is 
time-varying, is shown as a snapshot in a hyperplane of simul- 
taneity t' = 0 of K'. Surprisingly, these field lines all point toward 
the charge’s present position in K'. 

Disturbances in the electromagnetic field propagate at c, not in- 
stantaneously, so one might have expected the field at a certain 
location P in this figure to point toward a location at a distance r 
that the charge had occupied at an earlier time t' = -r/c. This 
would have produced a set of curved field lines reminiscent of the 
wake of a boat. To see that this is not possible, consider the point 
{0,0, h, 0) in the Minkowski coordinates of K, i.e., a point on the 
y axis. After a Lorentz transformation along x, the coordinates 
of this point in K' are still {0,0, h, 0), so in K' as well it lies on a 
line that passes transversely through the present position of the 
charge. Since this point has E x = 0 and B = 0 in K, application of 
the transformation laws shows that E' x = 0 as well, so that the field 
points toward the charge’s present position, not its past position. 

A similar but more complicated calculation shows that the field at 
intermediate angles is also in the instantaneously radial direction. 
Rather than filling in the details, we note that this makes sense be- 
cause the Poynting vector E x B then has no radial component, 
which is as expected because energy should be transported for- 
ward but not radiated outward. 

One might worry that this would indicate that the information about 
the charge’s position was propagating instantaneously, contra- 
dicting relativity. But this is a charge that has always been in its 
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current state of motion and always will be. If the charge’s motion 
had been disturbed by some external force at a time later than 
t' = -r/c, the field lines in K would still be pointing toward the lo- 
cation that the charge had previously occupied while at rest, and 
the field in K' would be pointing toward its linearly extrapolated 
position. 

A field behaving like a stick Example 4 

Figure d/2 appears identical to a copy of figure d/1 that has been 
Lorentz contracted by 1 /y, and we can verify from the transfor- 
mation laws for the fields that this is correct. Since these trans- 
formation laws apply regardless of how the fields were produced, 
we have a general rule, which is that if a field is purely electric in 
one frame, then its direction transforms to another frame in the 
same way as the direction of a stick, when we transform out of 
the stick’s rest frame. (See problem 3, p. 51 .) 

It is not true in general that electric field lines can simply be 
carried over from one frame to another as if we were Lorentz- 
contracting a rat’s nest built out of wire. This property holds only 
when the original frame is of a very special kind: a frame in which 
the field is purely electric. (We can always find such a frame if 
E 2 > e 2 ; see section 10.5.) As a counterexample to the notion 
that it applies more generally, consider the case in which a field is 
purely magnetic in a certain frame. Then the electric field lines do 
not even exist in the original frame, but do exist in the new one. 

Coming back to the case where the original field is purely electric, 
so that the stick-like behavior does hold, it is not immediately ob- 
vious why there should be this strange correspondence between 
sticks and field lines. The methods used in problem 3 do not 
seem to have much in common with the ones we have used to 
determine how the electric field behaves. But the following physi- 
cal argument shows that there is a simple reason for the identical 
behavior. 

Consider a stick with charges +q and -q fixed at the ends. The 
stick is nonrotating and moving inertially. In the stick’s rest frame 
K, there is a field line originating from +q and terminating on -q 
which coincides with the stick. Now consider frame K' moving 
in some direction relative to the stick. As discussed in example 
3, the field due to each charge points toward or away from its 
present instantaneous position in K' as well as K. Therefore each 
field, at the stick, is parallel to the stick, and we again have a field 
line in K' that coincides with the stick. Since the transformation of 
the field is independent of how the field was created, this holds 
for any field that is purely electric in the original frame. 
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10.5 Invariants 


We’ve seen cases before in which an invariant can be formed from 
a rank-1 tensor. The square of the proper time corresponding to a 
timelike spacetime displacement r is r • r or, in the index notation 
introduced in section ??, r a r a . From the momentum tensor we can 
construct the square of the mass p a p a - 

There are good reasons to believe that something similar can 
be done with the electromagnetic field tensor, since electromagnetic 
fields have certain properties that are preserved when we switch 
frames. Specifically, an electromagnetic wave consists of electric 
and magnetic fields that are equal in magnitude and perpendicular 
to one another. An electromagnetic wave that is a valid solution 
to Maxwell’s equations in one frame should also be a valid wave in 
another frame. It can be shown that the following two quantities 
are invariants: 


P = B 2 -E' 2 


and 


Q = E B. 

The fact that these are written as vector dot products of three- 
vectors shows that they are invariant under rotation, but we also 
want to show that they are relativistic scalars, i.e., invariant under 
boosts as well. To prove this, we can write them both in tensor 
notation. The first invariant can be expressed as P = ^ J- ah J- a b , 
while the second equals Q = \t abcd T abF c<h where e K ^ w is the Levi- 
Civita tensor. 

A field for which both P = Q = 0 is called a null field. An elec- 
tromagnetic plane wave is a null field, and although this is easily 
verifiable from the definitions of P and Q, there is a deeper reason 
why this should be true, and this reason applies not just to electro- 
magnetic waves but to other types of waves, such as gravitational 
waves. Consider any relativistic scalar s that is a continuous func- 
tion of the electromagnetic field tensor J 7 , i.e., a continuous function 
of rs components. We want s to vanish when T = 0. Given an 
electromagnetic plane wave, we can do a Lorentz boost parallel to 
the wave’s direction of propagation. Under such a boost the wave 
suffers a Doppler shift in its wavelength and frequency, but in ad- 
dition to that, the transformation equations on p. 221 imply that 
the intensity of the fields is reduced at any given point. Thus in the 
limit of an indefinite process of acceleration, T — > 0, and therefore 
s — > 0 as well. But since s is a scalar, its value is independent of our 
frame of reference, and so it must be zero in all frames. 
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P and Q are a complete set of invariants for the electromagnetic 
field, meaning that the only other electromagnetic invariants are 
those that either can be determined from P and Q or depend on the 
derivatives of the fields, not just their values. To see that P and Q 
are complete in this sense, we can break the possibilities down into 
cases, according to whether P and Q are zero or nonzero, positive 
or negative. As a representative example, consider the case where 
P < 0 and Q > 0. First we rotate our frame of reference so that E 
is along the x axis, and B lies in the x-y plane. Next we do a boost 
along the z axis in order to eliminate the y component of B; the 
field transformation equations on p. 221 make this possible because 
|E| > | B | . The result is that we have found a frame of reference in 
which E and B both lie along the positive x axis. The only frame- 
independent information that there is to know is the information 
available in this frame, and that consists of only two positive real 
numbers, E x and B x , which can be determined from the values of 
P and Q. 


A static null field Example 5 

Although an electromagnetic plane wave is a null field, the con- 
verse is not true. For example, we can create a static null field 
out of a static, uniform electric field and a static, uniform mag- 
netic field, with the two fields perpendicular to one another. 


Another invariant? Example 6 

Let IT be the squared magnitude of the Poynting vector, 
TT = (E x B) • (E x B). Since TT can be expressed in terms of 
dot products and scalar products, it is guaranteed to be invari- 
ant under rotations. However, it is not a relativistic invariant. For 
example, if we do a Lorentz boost parallel to the direction of an 
electromagnetic wave, the intensity of the wave changes, and so 
does IT. 


A non-null invariant for electromagnetic waves? Example 7 
The quantity Q^ 1 = 1 /(E • B) is clearly an invariant, and it doesn’t 
vanish for an electromagnetic plane wave — in fact, it is infinite 
for a plane wave. Does this contradict our proof that any invariant 
must vanish for a plane wave? No, because we only proved this in 
the case where the invariant is defined as a continuous function of 
P. Our function Q _1 is a discontinuous function of P when P = 0. 
Such discontinuous invariants tend not to be very interesting. For 
suppose we try to measure Q~\ and the thing we’re measuring 
happens to be an electromagnetic wave. Our measurements of 
the fields will probably be statistically consistent with zero, and 
therefore the error bars on our measurement of Q _1 will likely be 
infinitely large. 
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10.6 Stress-energy tensor of the 
electromagnetic field 

The electromagnetic field has a stress-energy tensor associated with 
it. From our study of electromagnetism we know that the electro- 
magnetic field has energy density U = ( E 2 + B 2 )/8irk and momen- 
tum density S = (E x B)/4vrfc (in units where c = 1, with k being 
the Coulomb constant). This fixes the components of the stress- 

and T " 4 , i.e. , the top row and left 
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e / Pressure and tension in 
electrostatic fields. 


The following argument tells us something about what to expect 
for the components T xx , T yy , and T zz , which are interpreted as 
pressures or tensions, depending on their signs. In figure e/1, the 
capacitor plates want to collapse against each other in the vertical 
(y) direction, but at the same time the internal repulsions within 
each plate make that plate want to expand in the x direction. If 
the capacitor is built out of materials that hold their shape, then 
the electromagnetic tension in T yy < 0 is counteracted by pressure 
T yy > 0 in the materials, while the electromagnetic pressure T xx > 0 
is canceled by the materials’ tension T xx < 0. We got these results 
for a particular physical situation, but relativity requires that the 
stress-energy be defined at every point based on the fields at that 
point, so our conclusions must hold generally. In e/2 and e/3, white 
boxes have been drawn in regions where the total field is strong 
and the fields are strongly interacting. In 2, there is tension in 
the x direction and pressure in y; the tension can be thought of as 
contributing to the attraction between the opposite charges. In 3, 
there is also x tension and y pressure; the pressure contributes to 
the like charges’ repulsion. 

To make this more quantitative, consider the discontinuity in E y 
at the upper plate in figure e/1. The field abruptly switches from 
0 on the outside to some value E between the plates. By Gauss’s 
law, the charge per unit area on the plate must be cr = E/4irk. 
The average field experienced by the charge in the plate is E = 
(0 + E)/2 = E/2, so the force per unit area, i.e., the tension in the 
field, is crE = E 2 /87rk. Thus we expect T yy = —E 2 /8Trk if E is 
along the y axis. 

For the reader who wants the full derivation of the remaining 
nine components of the tensor, we now give an argument that makes 
use of the following list of its properties. Other readers can skip 
ahead to where the full tensor is presented. 
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1. T is symmetric, T^ v = T Ufl . 

2. The components must be second-order in the fields, e.g., we 
can have terms like E X B Z , but not E X B/ or E x B z B y . This 
is because Maxwell’s equations are linear, and when a wave 
equation is perfectly linear, the corresponding energy expres- 
sion is second-order in the amplitude of the wave. 

3. T has the parity properties described in example 6 on p. 181. 

4. The electric and magnetic fields are treated symmetrically in 
Maxwell’s equations, so they should be treated symmetrically 
in the stress-energy tensor. E.g., we could have a term like 
7E 2 + 7B 2 , but not 7E X + 6 B 2 . 

5. On p. 187 of section 9.2.8, we saw that the trace energy con- 
dition T a a > 0 is satisfied by a cloud of dust if and only if 
the dust’s mass-energy is not transported at a speed greater 
than c. In section 4.1, we saw that all ultrarelativistic par- 
ticles have the same mechanical properties. Since a cloud of 
dust, in the limit where its speed approaches c, is on the edge 
of the bound set by the trace energy condition, T a a — »• 0, we 
expect that the electromagnetic field, in which disturbances 
propagate at c, should also exactly saturate the trace energy 
condition, so that T a a = 0. 

6. The stress-energy tensor should behave properly under rota- 
tions, which basically means that x, y, and z should be treated 
symmetrically. 

7. An electromagnetic plane wave propagating in the x direction 
should not exert any pressure in the y or z directions. 

8. If the field obeys Maxwell’s equations, then the energy-conservation 
condition dT ab /dx a = 0 should hold. 

These facts are enough to completely determine the form of the 
remaining nine components of the stress-energy tensor. Property 3 
requires that all of these components be even under parity. Since 
electric fields flip under parity but magnetic fields don’t (example 1, 
p. 220), these components can only have terms like EjEj and BiBj, 
not mixed terms like EiBj. Taking into account properties 4 and 6, 
we find that the diagonal terms must look like 

4vr kT xx = a{E 2 + B 2 X ) + b{E 2 + B 2 ), 
and the off-diagonal ones 

4t xkT xy = c(E x E y + B x B y ). 

Property 5 gives 1/2 — a — 3b = 0 and 7 gives b = —a/2, so we have 
a = — 1 and b = 1/2. The determination of c = — 1 is left as an 
exercise, problem 4 on p. 237. 
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We have now established the complete expression for the stress- 
energy tensor of the electromagnetic field, which is 


where 
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and <7, known as the Maxwell stress tensor, is given by 
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All of this can be expressed more compactly and in a coordinate- 
independent way as 

T ab = (^F ac E b c + \o d o d g ab E ef E^ , (2) 

where o is a future-directed velocity vector, so that o d o d = +1 for 

the signature H used in this book, and —1 if the signature is 

- + ++. 


Stress-energy tensor of a plane wave Example 8 

Let an electromagnetic plane wave (not necessarily sinusoidal) 
propagate along the x axis, with its polarization such that E is in 
the y direction and B on the z axis, and |E| = |B| = A. Then we 
have the following for the stress-energy tensor. 

1 0 0 \ 

1 0 0 
0 0 0 ’ 

0 0 0 / 

The T n component tells us that the wave has a certain energy 
density. Because the wave is massless, we have E 2 - p 2 = m 2 = 
0, so the momentum density is the same as the energy density, 
and T tx is the same as T tl . If this wave strikes a surface in the 
yz plane, the momentum the surface absorbs from the wave will 
be felt as a pressure, represented by T xx . 




A 2 

4nk 


n 

i 

0 

\o 


In example 5 on p. 181, we saw that a cloud of dust, viewed in a 
frame moving at velocity v relative to the dust’s rest frame, had 
the following stress-energy tensor. 


T^ y = 


( Y 2 P 
y 2 vp 

0 

V o 


y 2 vp 
y 2 v 2 p 

0 

0 


o\ 

0 


0 
0 

0 0 

0 0 / 
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In the ultrarelativistic limit v -» 1, this becomes 


T yy = (energy density) x 


n 

i 

0 

Vo 


i 

i 

0 

0 


0 

0 

0 

0 


0\ 

0 

0 ’ 

0 / 


which is exactly the same as the result for our electromagnetic 
wave. This illustrates the fact discussed in section 4.1 that all 
ultrarelativistic particles have the same mechanical properties. 

Mass of a capacitor Example 9 

Consider the mass of a charged parallel-plate capacitor, figure 
f/1, first in its rest frame and then in a frame boosted in the di- 
rection parallel to the field (perpendicular to the plates). If we’re 
not careful, we run into the following paradox. Under a boost, an 
electric field parallel to the boost remains unchanged. Therefore 
in the boosted frame, we have exactly the same field strength, 
but filling a volume that has been decreased by length contrac- 
tion. Therefore the mass-energy of the capacitor is greatest in its 
own rest frame, which is absurd and would contradict our proof 
in section 9.3.4 that the energy-momentum of an isolated system 
transforms as a four-vector. 

The resolution of the paradox comes from recognizing that we as- 
sumed the capacitor to be in static equilibrium, but we ignored the 
stress-energy of whatever mechanical supports were maintaining 
this equilibrium. If we consider only the stress-energy T (em) of 
the electromagnetic field, then we have = ( t/8nk)E 2 (en- 
ergy density) and = -(1 /8nk)E 2 (tension in the y direction, 
parallel to the field), figure f/2. It’s easy to see that this has a 
nonvanishing divergence, since d y T ^ 4 0 at the plates, and 
there are no other terms in the stress-energy tensor that could 
compensate for this. 

There is nothing surprising here; only the total stress-energy ten- 
sor T has to be divergenceless, not T (em) . It would violate the 
laws of physics if the capacitor were to remain in equilibrium like 
this without some force to counter the electromagnetic tension. 
Let’s say that this force is provided by a spring, as in figure f/3. 
The spring has its own contribution T (s) to the stress-energy. For 
convenience, let’s imagine making the spring filled in (rather than 
a hollow cylinder) and fattening it up so that it fills the entire inte- 
rior volume of the capacitor. Then to achieve static equilibrium in 
the rest frame, we need the pressure in the spring to cancel out 
the pressure in the electric field. We therefore have T yy = 0 for 
the total stress-energy tensor. 

If we now apply the tensor transformation law to the stress-energy 
tensor, we find that the stress-energy tensor in the boosted frame 
contains a mass-energy density T 1 ' 1 ' that depends only on T lt 


1 

1 _L _L _L _L _L _l_ 

Lx 

~x — i 

1 

1 

r i 

i i 

i i 

j l 

l l 


2 pressure 
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tension 




and T yy . (There also has to be an xx component to keep the 
plates from exploding laterally, but that doesn’t enter here.) But 
we have T yy = 0, so the problem is exactly the same as trans- 
forming a lump of nonrelativistic matter, and we know that that 
calculation comes out OK. For an explicit demonstration that this 
still works out if we drop the simplifying assumption that the spring 
fills the entire interior volume of the capacitor, see Rindler and De- 
nur, “A simple relativistic paradox about electrostatic energy,” Am 
J Phys 56 (1988) 9. 

10.7 Maxwell’s equations 

10.7.1 Statement and interpretation 

In this book I assume that you’ve had the usual physics back- 
ground acquired in a freshman survey course, which includes an 
initial, probably frightening, encounter with Maxwell’s equations in 
integral form. In units with c = 1, Maxwell’s equations are: 


<&E = 4vr kq 

(3a) 

$B= 0 

(3b) 


(3c) 

(j)B-d£= d ® E + 4nkl 

(3d) 


where 


d>E = 

j E • da and 

(4) 

= 

[ B • da. 

(5) 


Equations (3a) and (3b) refer to a closed surface and the charge q 
contained inside that surface. Equation (3a), Gauss’s law, says that 
charges are the sources of the electric field, while (3b) says that 
magnetic “charges” don’t exist. Equations (3c) and (3d) refer to 
a surface like a potato chip, which has an edge or boundary, and 
the current I passing through that surface, with the line integrals 
in being evaluated along that boundary. The right-hand side of 
(3c) says that a changing magnetic field produces a curly electric 
field, as in a generator or a transformer. The I term in (3d) says 
that currents create magnetic fields that curl around them. The 
d&E/dt term, which says that changing electric fields create mag- 
netic fields, is necessary so that the equations produce consistent 
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results regardless of the surfaces chosen, and is also part of the ap- 
paratus responsible for the existence of electromagnetic waves, in 
which the changing E field produces the B, and the changing B 
makes the E. 

Equations (3a) and (3b) have no time-dependence. They func- 
tion as constraints on the possible field patterns. Equations (3c) 
and (3d) are dynamical laws that predict how an initial held pat- 
tern will evolve over time. It can be shown that if (3a) and (3b) are 
satisfied initially, then (3c) and (3d) ensure that they will continue 
to be satisfied later. Because the dynamical laws consist of two vec- 
tor equations, they provide a total of 6 constraints, which are the 
number needed in order to predict the behavior of the 6 fields E x , 
I' ij ■ ■ E X: l>y . and B z . 

10.7.2 Experimental support 

Before Einstein’s 1905 paper on relativity, the known laws of 
physics were Newton’s laws and Maxwell’s equations (3a)-(3d). Ex- 
periments such as example 4 on p. 85 show that Newton’s laws 
are only low-velocity approximations. Maxwell’s equations are not 
low- velocity approximations; for example, in section 1.3.1 we noted 
the evidence that atoms are electrically neutral, in agreement with 
Gauss’s law, (3a), to one part in 10 21 , even though the electrons in 
atoms typically have velocities on the order of 1-10% of c. 

10.7.3 Incompatibility with Galilean spacetime 

Maxwell’s equations are not compatible with the Galilean de- 
scription of spacetime (section 1.1.2, p. 13). If we assume that 
equations (3) hold in some frame o, and then apply a Galilean 
boost, transforming the coordinates ( t,x,y,z ) to (t' ,x' ,y' , z') = 
(t,x — vt,y,z), we find that in frame o' the equations have a dif- 
ferent and more complicated form that cannot be simplified so as 
to look like the form they had in o. Rather than writing out the 
resulting horrible mess and verifying that it can’t be cast back into 
the simpler form, an easier way to prove this is to note that there 
are solutions to the equations in o that are not solutions after a 
Galilean boost into o', if we try to keep the equations in the same 
form. For example, if a light wave propagates in the x direction at 
speed c in o, then after a boost with v = c, we would have a light 
wave in frame o' that was standing still. (This is Einstein’s thought 
experiment of riding alongside a light wave on a motorcycle, p. 13.) 
Such a wave would violate (3c), since the left-hand side would be 
nonzero for a surface lying in the xy plane, but the time derivative 
on the right-hand side would be zero. 

10.7.4 Not manifestly relativistic in their original form 

Since Maxwell’s equations are not low-velocity approximations 
and are incompatible with Galilean relativity, we expect with the 
benefit of historical hindsight that they are compatible with the 
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relativistic picture of spacetime. But when they are expressed in 
the form (3), they have two features, either one of which seems 
enough to make them completely incompatible with relativity: 



g/A magnetic field that vio- 
lates Ob = 0. 



h / 1. An electron jumps through 
a hoop. 2. An alternative surface 
spanning the hoop. 


(i) They appear to describe instantaneous action at a distance. 
For example, Gauss’s law, <&e = 47 rkq, relates the electric held 
in one place (on the closed surface) to the electric charge some- 
where else (inside the surface). This nonlocal structure smells 
wrong relativistically, for the reasons discussed in section 10.2. 

(ii) They appear to treat time and space asymmetrically. 

What’s really happening here is that equations (3) are like a version 
of Hamlet written in crayon on a long strip of toilet paper. They are 
completely relativistic, but have been written in a form that hides 
that fact. 

The problem of nonlocality, i, can be shown to be a non-issue 
because Maxwell’s equations can be reworked into a form in which 
they are purely local. The idea is shown in figure g. The magnetic 
held lines all form closed loops, except for one of them, which begins 
at a point in space and extends off to infinity. Drawing the large 
box, 1, we find that 4*^1, the flux of the magnetic held through 
the box, is not zero, because a line leaves the box but none come 
in. But the same discrepancy could have been detected with the 
smaller box 2, or in fact with an arbitrarily small box containing 
the source of the held line. In other words, the equation = 0 is 
nonlocal, but if it is to hold for any surface, then it must also hold 
locally, in the limit of an arbitrarily small surface. This purely local 
law of physics can be expressed using the three-dimensional version 
of the divergence, introduced on p. 178: 

dB x dB v 0B Z 
dx dy dz 

Of the four Maxwell’s equations, both equation (3a) and (3b) can 
be reexpressed in this way. This book neither presents the full ma- 
chinery of vector calculus nor assumes previous knowledge of it, but 
a similar limiting procedure can also be applied to equations (3c) 
and (3d), using an operator called the curl. 

The following example is one in which both problem i and prob- 
lem ii turn out not to be problems. 

Jumping through a hoop Example 1 0 

Here is an example in which the non-obvious features of Maxwell’s 
equations prevent the antirelativistic meltdown projected in i. In 
figure h/1, an electron jumps back and forth through an imagi- 
nary circular hoop, across which we construct an imaginary flat 
surface. Every time the electron pierces the surface, it makes a 
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momentary spike in the current /, which appears in (3d), 

r d® F 
B • d£ = + AnkL 

We might expect that this would cause the field B detected on the 
edge of the disk to show similar spikes at the same times. But 
“same times” implies some notion of simultaneity, and this would 
be incompatible with relativity, since the t coordinate being re- 
ferred to here is just one observer’s notion of time. Furthermore, 
it would seem that information was being transmitted instanta- 
neously from the center of the disk to its edge, which violates 
relativity (p. 17). 

Stranger still, we can produce an apparent paradox without even 
appealing to relativity. Instead of the flat surface in h/1, we can 
pick a dish-shaped one, h/2, with a deep enough curve so that the 
electron never crosses it. The current / is always zero according 
to this surface, so that no field B would be produced at the rim at 
all. 

The resolution of all these difficulties lies in the term 30 E /3f, 
which we’ve ignored. With surface 1, the electron crosses the 
surface in time St, causing a current / = e/St but also causing a 
change in the flux from O e ss 2nke to O e ss -2nke. The result 
is that the right-hand side of the equation is nearly zero. With 
surface 2, / = 0 and 30 E /dt « 0, so the right-hand side is again 
nearly zero. 

When the approximations used above are eliminated, Maxwell’s 
equations do predict a nonvanishing field, which is the expected 
electromagnetic wave propagating away from the electron at the 
proper speed c. 

10.7.5 Lorentz invariance 

Example 10 might seem like a “just-so story,” but the appar- 
ently miraculous resolution is not a coincidence. It happens because 
Maxwell’s equations are in fact invariant under a Lorentz transfor- 
mation, even though that isn’t obvious when they’re written in the 
form (3a)-(3d). There are various ways of showing this: 


• Einstein did it by brute force in his 1905 paper on relativity, 
by transforming the coordinates through a Lorentz transfor- 
mation and the fields as in section 10.4. 

• Maxwell’s equations are basically wave equations. (They have 
both wave solutions and static solutions.) We can verify that 
when we start with a sinusoidal plane wave in one frame, then 
transform into another frame, the result is again a valid sine- 
wave solution, having been subjected to a Doppler shift (sec- 
tion 3.2) and aberration (section 6.5). This requires checking 
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that the wave is still purely transverse, but that follows eas- 
ily from examining the invariants described in section 10.5. 
By a celebrated mathematical result called Fourier’s theorem, 
any well-behaved wave can be written as a sum of sine waves, 
and therefore any wave solution of Maxwell’s equations in one 
frame is also a solution in every other frame. 

• Maxwell’s equations can be rewritten in terms of tensors, obey- 
ing all the grammatical rules of index gymnastics. If they can 
be written in this form, they are automatically Lorentz invari- 
ant. 


The last approach is the most general and elegant, so we’ll pro- 
vide a brief sketch of how it works. Equation (3a) has 47 t times the 
charge on the right, while (3d) has 47T times the current. These both 
relate to the current four- vector J, so clearly we need to combine 
them somehow into a single equation with J on the right. Since 
the local form of equation (3a) involves the three-dimensional di- 
vergence, which contains first derivatives, the left-hand side of this 
combined equation should have a first derivative in it. Given the 
grammatical rules of tensors and index gymnastics, we don’t have 
many possible ways to accomplish this. The only obvious thing to 
try is 


OF 1 ' 1 ' 

dx u 


= AnkJ^. 


( 6 ) 


Writing this out for p being the time coordinate, we get a relation 
that equates the divergence of E to 47r times the charge density; this 
is the local equivalent of (3a). If you’ve taken vector calculus and 
know about the curl operator and Stokes’ theorem, then you can 
verify that for p referring to x, y, and z, we recover the local form 
of (3d). The tensorial way of expressing (3b) and (3c) turns out to 
be 


dF> w dF uX (97 rA,i 
dx x dx v dx v 


( 7 ) 


i / Example 11. 



I \ ^ j \ ^ ^ ^ » 


A generator Example 1 1 

Figure i shows a crude, impractical generator, depicted in two 
frames of reference. 
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Flea 1 is sitting on top of the bar magnet, which creates the mag- 
netic field pattern shown with the arrows. To her, the bar magnet 
is obviously at rest, and this magnetic field pattern is static. As 
the square wire loop is dragged away from her and the magnet, 
its protons experience a force in the -z direction, as determined 
by the Lorentz force law F = qv x B. The electrons, which are 
negatively charged, feel a force in the +z direction. The conduc- 
tion electrons are free to move, but the protons aren’t. In the front 
and back sides of the loop, this force is perpendicular to the wire. 
In the right and left sides, however, the electrons are free to re- 
spond to the force. Since the magnetic field is weaker on the right 
side, current circulates around the loop. 

Flea 2 is sitting on the loop, which she considers to be at rest. In 
her frame of reference, it’s the bar magnet that is moving. Like 
flea 1, she observes a current circulating around the loop, but 
unlike flea 1, she cannot use magnetic forces to explain this cur- 
rent. As far as she is concerned, the electrons were initially at 
rest. Magnetic forces are forces between moving charges and 
other moving charges, so a magnetic field can never accelerate 
a charged particle starting from rest. A force that accelerates a 
charge from rest can only be an electric force, so she is forced 
to conclude that there is an electric field in her region of space. 
This field drives electrons around and around in circles — it is a 
curly field. What reason can flea 2 offer for the existence of this 
electric field pattern? Well, she’s been noticing that the magnetic 
field in her region of space has been changing, possibly because 
that bar magnet over there has been getting farther away. She 
observes that a changing magnetic field creates a curly electric 
field. Thus the dO B /dt term in equation (3c) is not optional; it is 
required to exist if Maxwell’s equations are to be equally valid in 
all frames. 

Einstein opens his 1905 paper on relativity 1 begins with this sen- 
tence: “It is known that Maxwell’s electrodynamics — as usually 
understood at the present time — when applied to moving bodies, 
leads to asymmetries which do not appear to be inherent in the 
phenomena.” Fie then gives essentially this example. Although 
the observers in frames 1 and 2 agree on all physical measure- 
ments, their explanations of the physical mechanisms, couched 
in the language of Maxwell’s equations in the form (3), are com- 
pletely different. In relativistic language, flea 2’s explanation can 
be written in terms of equation (7), in the case where the indices 
are x, z, and t: 


8T XZ dF zt dP x 
dt + dx + dz 


1 “Zur Elektrodynamik bewegter K orper,” Annalen der Physik. 17 (1905) 
891. Translation by Perrett and Jeffery 
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which is the same as 


d By dE z 0E X 

— H = 0. 

dt dx dz 

Because the first term is negative, the second term must be pos- 
itive. Since equations (6) and (7) are written in terms of tensors, 
obeying the grammatical rules of index gymnastics, we are guar- 
anteed that they give consistent predictions in all frames of refer- 
ence. 

Conservation of charge and energy-momentum Example 12 
Solving equation (6) for the current vector, we have 

_ 1 djw 
4nk dx y 


Conservation of charge (section 9.1.2, p. 178) can be expressed 


If we substitute the first equation into the second, we obtain 


d ( 1 _ 

dx ^ \4nk dx y ) 


or 

d 2 ^ y 
dx^dx y ~ ’ 

with a sum over both \± and v. But this equation is automatically 
satisfied because T is antisymmetric, so for every combination of 
indices p and v, the term involving is canceled by one con- 
taining Thus conservation of charge does not have 

to be added as a supplementary condition in addition to Maxwell’s 
equations; it is automatically implied by Maxwell’s equations. 

Using equation (2) on p. 228, one can also prove that Maxwell’s 
equations imply conservation of energy-momentum. 
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Problems 


1 (a) A parallel-plate capacitor has charge per unit area ±cr on 

its two plates. Use Gauss’s law to find the field between the plates. 

(b) In the style of example 2 on p. 221, transform the field to a 
frame moving perpendicular to the plates, and verify that the result 
makes sense in terms of the sources that are present. 

(c) Repeat the analysis for a frame moving parallel to the plates. 


2 We’ve seen examples such as figure a on p. 215 in which a 
purely magnetic field in one frame becomes a mixture of magnetic 
and electric fields in another, and also cases like example 2 on p. 221 
in which a purely electric field transforms to a mixture. Can we have 
a case in which a purely electric field in one frame transforms to a 
purely magnetic one in another? The easy way to do this problem 
is by using invariants. 

3 (a) Starting from equation (1) on p. 220 for T^ u , lower 

an index to find T ^ v . Assume Minkowski coordinates and metric 
signature -| . 

(a) Let v = 7(1 ,u x ,u y ,u z ), where (u x ,u y ,u z ) is the velocity three- 
vector. Write out the matrix multiplication F ,L = qT^v 11 , and show 
as claimed on p. 220 that the result is the Lorentz force law. 

4 On p. 226 I presented a list of properties of the electromag- 
netic stress tensor, followed by an argument in which the tensor is 
constructed with three unknown constants a, b, and c, to be deter- 
mined from those properties. The values of a and b are derived in 
the text, and the purpose of this problem is to finish up by proving 
that c = — 1. The idea is to take the field of a point charge, which 
we know satisfies Maxwell’s equations, and then apply property 8 , 
which requires that the energy-conservation condition dT ab /dx a = 0 
hold. This works out nicely if you apply this property to the x col- 
umn of T, at a point that lies in the positive x direction relative to 
the charge. 

5 Show that the number of independent conditions contained 
in equations ( 6 ) and (7) agrees with the number found in equations 
(3a)-(3d). 

6 Show that 

8T ^ 8T uX 8T X ^ _ 

dx x dx v 8x v 

(equation (7), p. 234) implies that the magnetic field has zero diver- 
gence. 


7 Write down the fields of an electromagnetic plane wave propa- 
gating in the z direction, choosing some polarization. Do not assume 
a sinusoidal wave. Show that this is a solution of 


dT ^ 
8x v 


= 0 
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(equation (6), p. 234, in a vacuum). 
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